Cluster Monitoring
    • PDF

    Cluster Monitoring

    • PDF

    Article Summary

    Available in VPC

    Note

    Cluster Monitoring is provided in Kubernetes version 1.23 or later.
    To use clusters in earlier versions than 1.23, you must install the required version using the upgrade feature.

    Cluster Monitoring collects the usage of resources in a Kubernetes cluster and sends the data to the Cloud Insight product. The collected data can be seen on the Grafana dashboard or Cloud Insight dashboard in Ncloud Kubernetes Service on the NAVER Cloud Platform console. You can see the usage amount without a separate resource monitoring tool through them.

    Supported versions

    Feature\Kubernetes Version1.221.23+
    Kubernetes Cluster MonitoringNot supportedSupported

    Usage fee

    This service offers the same pricing plan as Cloud Insight.
    For more details about the Cloud Insight pricing plan, see Service > Cloud Insight on the NAVER Cloud Platform portal.

    Collection target

    Note

    Monitoring for pods within the kube-system namespace is not supported.

    Kubernetes cluster monitoring collects resources corresponding to the types defined below.

    TypeDescription
    CpuMemThe pod's CPU/MEM resource usage
    NetworkThe pod's network in/out rate
    DiskThe pod's Block Storage resource usage
    NodeAvailabilityAvailable node capacity within the cluster
    PodStatusStatus of pods

    Type: CpuMem

    Dimensions

    DimensionsDescription
    typeDefined as CpuMem
    clusterUUIDcluster UUID
    nodeInstanceNoThe instance number of the node where the pod is located
    namespaceThe namespace where the pod is located
    controllerThe controller that created the pod
    podThe pod's name

    Metrics

    MetricsDescription
    real_cpuThe pod's CPU usage
    real_memThe pod's memory usage

    Type: Network

    Dimensions

    DimenionsDescription
    typeDefined as Network
    clusterUUIDcluster UUID
    nodeInstanceNoThe instance ID of the node where the pod is located
    namespaceThe namespace where the pod is located
    controllerThe controller that created the pod
    podThe pod's name
    interfaceThe interface where the network in/out occurred

    Metrics

    MetricsDescription
    network_rx_bytesThe amount of bytes received
    network_tx_bytesThe amount of bytes sent

    Type: Disk

    Dimensions

    DimensionsDescription
    typeDefined as Disk
    clusterUUIDcluster UUID
    nodeInstanceNoThe instance number of the node where the pod is located
    namespaceThe namespace where the pod is located
    controllerThe controller that created the pod
    podThe pod's name
    pvcThe pvc's name

    Metrics

    MetricsDescription
    available_bytesThe amount of disk available in bytes
    capacity_bytesThe total disk capacity in bytes
    used_bytesThe amount of disk used in bytes
    disk_used_ratioRatio of the available disk, between 0 and 1

    Type: NodeAvailability

    Dimensions

    DimensionsDescription
    typeDefined as NodeAvailability
    clusterUUIDcluster UUID

    Metrics

    MetricsDescription
    node_total_countTotal number of the in-cluster nodes
    node_ready_countNumber of in-cluster nodes that are ready
    node_not_ready_countNumber of in-cluster nodes that are not ready
    node_available_rationode_ready_count / node_total_count, between 0 and 1

    Type: PodStatus

    Dimensions

    DimensionsDescription
    typeDefined as PodStatus
    clusterUUIDcluster UUID
    namespaceThe namespace where the pod is located
    controllerThe controller that created the pod

    Metrics

    MetricsDescription
    pod_phase_pendingThe number of pods with a phase of pending within the controller
    pod_phase_runningThe number of pods with a phase of running within the controller
    pod_phase_succeededThe number of pods with a phase of succeeded within the controller
    pod_phase_failedThe number of pods with a phase of failed within the controller
    pod_available_ratioThe ratio of available pods belonging to the controller, between 0 and 1, supports ReplicaSet, DaemonSet, and StatefulSet
    pod_restart_countThe sum of restart counts for pods belonging to the controller

    Grafana

    Grafana is a tool which provides visualization of time series data. You can view data transmitted through Cloud Insight on your Grafana dashboard.
    You can find the Grafana link from the Monitoring section of the Cluster sub-list within Ncloud Kubernetes Service on the NAVER Cloud Platform console.

    Overview

    You can check the list of clusters you have. You can monitor resources within the cluster using the following buttons.

    • Nodes: you can view the availability of in-cluster worker nodes and resource usage of CPU, memory, disk, and network.
    • Pods: you can view the resource usage of CPU, memory, disk, and network for in-cluster pods.

    Alerts

    You can set thresholds for resources within the cluster and configure event rules to receive alerts when specific conditions are met. The configured values are stored in the Cloud Insight service and can also be viewed in the event rules of the Cloud Insight service.
    In the Alerts tab, you can find the following information.

    • Cloud Insight Event Rule Group
      • You can create, delete, or view event rule groups for the selected cluster.
      • You can set threshold conditions for metrics under the event rule group.
      • The following product types are supported.
        • Ncloud kubernetes service

    Click the created Cloud Insight event rule group to view the following sub-groups.

    • Monitor Group
      • This is a resource group where metrics for monitoring are generated.
    • Metric Group
      • These are values for setting conditions for triggering an alert for the metrics generated in the monitor group.
      • You can click the New Metric button to add conditions.
    • Notification Group
      • A group of recipients to be notified when an alert set in the metric group is triggered.
      • You can click the New Recipient button to register notification recipients.
    Note

    The recipients in the notification group can be managed in the Notification Recipient menu within the Cloud Insight service. This feature is not supported in the Alerts menu of the Ncloud Kubernetes Service.

    The following is the considerations for metrics you can register in your metric group.

    Metrics examples

    • Controllers, such as deployment, requests scheduling a specific number of pods, but due to the impact of admission controllers, etc., if fewer pods than the expected ratio are running
      • Set pod_available_ratio < 0.7 to trigger a notification if the ratio of available pods to the desired pods is less than 70%
    • If the number of pods requested to be generated by a controller, such as deployment, but not Running is increasing
      • Set pod_available_ratio < 0.7 to trigger a notification if the ratio of available pods to the desired pods is less than 70%
      • Set pod_phase_pending >= 1 to trigger an alert if there is at least one pod in the Pending phase
    • If the node is not Ready
      • Set node_available_ratio < 0.8 to trigger an alert if the ratio of Ready nodes to the total nodes is less than 80%
      • Set node_not_ready_count >= 1 to trigger an alert if there is at least one node that is not Ready
    • If the number of pods in the Crash phase is increasing
      • Set pod_restart_count >= 10 to trigger an alert if a pod restarts more than a specific number of times by kubelet or others
    • If a job like Cronjob fails and results in Failed phase
      • Set pod_phase_failed >= 1 to trigger an alert if a job fails

    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.