Cluster Monitoring

Prev Next

Available in VPC

Note
  • Cluster Monitoring is available for Kubernetes version 1.23 or later.
  • To use clusters in earlier versions than 1.23, you must install the required version using the upgrade function.
  • For Sub Account, resource type Cluster/* permissions to action View/getClusterDetail are required to use Cluster Monitoring.

Cluster Monitoring collects the usage of resources in a Kubernetes cluster and forwards the data to the Cloud Insight product. The collected data can be seen on the Grafana dashboard or Cloud Insight dashboard in Ncloud Kubernetes Service on the NAVER Cloud Platform console. This enables you to check your usage without any other resource monitoring tools.

Supported versions

Feature\Kubernetes Version 1.22 1.23+
Kubernetes Cluster Monitoring Not supported Supported

Pricing information

Cluster Monitoring offers the same pricing plan as Cloud Insight.
For more details about the Cloud Insight pricing plan, see Service > Cloud Insight on the NAVER Cloud Platform portal.

Data retention period

Cluster Monitoring follows the data retention period of Cloud Insight. For more information about the data retention period, see Data retention period in Cloud Insight prerequisites.

Interval Data retention period (from today) Recommended data search period
Min1 Within 8 days 1 day
Min5 Within 1 months 1 week
Min30 Within 3 months 1 month
Hour2 Within 6 months 1 month
Day1 Within 1 year 1 month

Collection target

Note

Monitoring for pods within the kube-system namespace is not supported.

Kubernetes cluster monitoring collects resources for the types defined as follows:

Type Description
CpuMem Pod's CPU/MEM resource usage
Network Pod's network in/out rate
Disk Pod's Block Storage resource usage
NodeAvailability Available node capacity within the cluster
PodStatus Status of pods

Type: CpuMem

Dimensions

Dimensions Description
type Defined as CpuMem
clusterUUID cluster UUID
nodeInstanceNo The instance number of the node where the pod is located
namespace The namespace where the pod is located
controller The controller that created the pod
pod Name of the pod

Metrics

Metrics Description
real_cpu Pod CPU usage
real_mem Pod memory usage

Type: Network

Dimensions

Dimenions Description
type Defined as Network
clusterUUID cluster UUID
nodeInstanceNo The instance ID of the node where the pod is located
namespace The namespace where the pod is located
controller The controller that created the pod
pod Name of the pod
interface The interface where the network in/out occurred

Metrics

Metrics Description
network_rx_bytes Bytes received
network_tx_bytes Bytes sent

Type: Disk

Dimensions

Dimensions Description
type Defined as Disk
clusterUUID cluster UUID
nodeInstanceNo The instance number of the node where the pod is located
namespace The namespace where the pod is located
controller The controller that created the pod
pod Name of the pod
pvc The pvc's name

Metrics

Metrics Description
available_bytes Disk bytes available
capacity_bytes Total disk bytes
used_bytes Disk bytes used
disk_used_ratio Ratio of the available disk, between 0 and 1

Type: NodeAvailability

Dimensions

Dimensions Description
type Defined as NodeAvailability
clusterUUID cluster UUID

Metrics

Metrics Description
node_total_count Total node count within cluster
node_ready_count Number of in-cluster nodes that are ready
node_not_ready_count Number of in-cluster nodes that are not ready
node_available_ratio node_ready_count / node_total_count, between 0 and 1

Type: PodStatus

Dimensions

Dimensions Description
type Defined as PodStatus
clusterUUID cluster UUID
namespace The namespace where the pod is located
controller The controller that created the pod

Metrics

Metrics Description
pod_phase_pending The number of pods with a phase of pending within the controller
pod_phase_running The number of pods with a phase of running within the controller
pod_phase_succeeded The number of pods with a phase of succeeded within the controller
pod_phase_failed The number of pods with a phase of failed within the controller
pod_available_ratio The ratio of available pods belonging to the controller, between 0 and 1, supports ReplicaSet, DaemonSet, and StatefulSet
pod_restart_count Total restart counts for all pods within the controller

Grafana

Grafana is a visualization tool for time series data. The Grafana dashboard allows you to check data transferred via Cloud Insight.
You can find the Grafana link from the Monitoring section of the Cluster sub-list within Ncloud Kubernetes Service on the NAVER Cloud Platform console.

Note

Using the Grafana dashboard incurs no fees.

Overview

You can check the list of clusters you own. Click the following buttons to monitor the resources in the clusters:

  • Nodes: you can view the availability of in-cluster worker nodes and resource usage of CPU, memory, disk, and network.
  • Pods: you can view the resource usage of CPU, memory, disk, and network for in-cluster pods.

Alerts

You can set thresholds for resources within the cluster and configure event rules to receive notifications when specific conditions are met. The configured values are stored in the Cloud Insight service and can also be viewed in the event rules of the Cloud Insight service.
In the Alerts tab, you can find the following information:

  • Cloud Insight Event Rule Group
    • You can create, delete, or view event rule groups for the selected cluster.
    • You can set threshold conditions for metrics under the event rule group.
    • The following product types are supported:
      • Ncloud kubernetes service

Click the created Cloud Insight event rule group to view the following sub-groups:

  • Monitor Group
    • This is a resource group where metrics for monitoring are generated.
  • Metric Group
    • These are values for setting conditions for triggering a notification for the metrics generated in the monitor group.
    • You can click the New Metric button to register conditions.
  • Notification Group
    • A group of recipients to be notified when a notification set in the metric group is triggered.
    • You can click the New Recipient button to register notification recipients.
Note

The recipients in the notification group can be managed in the Notification Recipient service on the NAVER Cloud Platform console. This function is not supported in the Alerts menu of the Ncloud Kubernetes Service.

The following metrics can be considered for registration in your Metric Group.

Metrics examples

  • If controllers, such as deployment, request scheduling a specific number of pods, but due to the impact of admission controllers, etc., fewer pods than the expected ratio are running
    • Set pod_available_ratio < 0.7 to trigger a notification if the ratio of available pods to the desired pods is less than 70%
  • If the number of pods that are not in the running state among those requested to be generated by a controller, such as deployment, increases
    • Set pod_available_ratio < 0.7 to trigger a notification if the ratio of available pods to the desired pods is less than 70%
    • Set pod_phase_pending >= 1 to trigger a notification if there is at least one pod in the pending phase
  • If the node is not ready
    • Set node_available_ratio < 0.8 to trigger a notification if the ratio of ready nodes to the total nodes is less than 80%
    • Set node_not_ready_count >= 1 to trigger a notification if there is at least one node that is not ready
  • If the number of pods in the crash phase increases
    • Set pod_restart_count >= 10 to trigger a notification if a pod restarts more than a specific number of times by kubelet or others
  • If a task like Cronjob fails and results in failed phase
    • Set pod_phase_failed >= 1 to trigger a notification if a task fails