Troubleshoot scalability and performance issues

Available in VPC

You might run into the following problems when using Ncloud Kubernetes Service. Find out causes and possible solutions.

Scale In/Out not occurring after setting Cluster Autoscaler

Cluster Autoscaler is enabled, but Scale In/Out does not occur.

Cause

Nodes are increased when there are unscheduled pods due to insufficient resources within Cluster Autoscaler clusters, and nodes are reduced when usage for a certain node is low for a certain time. High pod load alone does not increase the number of nodes if resource request is not made.

Autoscaler proceeds with Scale In/Out when the following conditions are met:

Conditions for Scale In
- When the pod is waiting in Pending status
- When the node does not have sufficient resources to accommodate the pod
Conditions for Scale Out
- When the node's pod may be scheduled to other nodes
- When a specific node is not used for a certain period or maintains low resource usage

Cluster Autoscaler may not operate for various reasons, so you must check detailed causes.

Solution

It does not operate if the pod in use is missing Resource Request and Limit settings. You must complete Resource Request&Limit settings.
If there are exceptions to node decrease, nodes may not decrease. The exceptions are as follows:
- When nodes are not controlled by a controller, such as Deployment and StatefulSet
- When Local Storage is enabled
- When the pod cannot move to another node
- When the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict" is set to "false"
- For other exceptions, see Official documentation

Note

For more information on the Scale Down methods of ClusterAutoscaler, see How does scale-down work?
It may be inefficient when the actual node resource usage is different from the pod resource requests, so it is recommended to edit and use the pod resource requests with a proper value.
Even if the actual node resource usage is low, if the pod is in Pending status due to the pod resource requests, Cluster AutoScaler detects it and expands the node automatically. For more information, see How to schedule pods that include resource requests

Changing parameters when using Cluster AutoScaler

I want to change or view parameters when using Cluster Autoscaler.
I cannot change parameters when using Cluster AutoScaler.

Cause

In setting Cluster Autoscaler, the default parameters are used. The function of changing parameters is not provided.

Solution

For more information on the parameters and default values used in Cluster Autoscaler, see What are the parameters to CA

Excluding specific nodes from Scale Down

I want to exclude certain nodes from Scale Down when using Cluster Autoscaler.

Cause

By default, Cluster Autoscaler treats underutilized nodes as candidates for Scale Down. Unless explicitly excluded, any node may be considered for reduction.

Solution

Add annotations to the nodes that should be excluded from Scale Down.

$ kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-disabled=true

Cluster expanded automatically

A cluster was expanded automatically after enabling Cluster Autoscaler.

Cause

For stable networking, the cilium-operator is configured to run 2 pods. On a single worker node, one pod is in Running status while the other remains in Pending status due to scheduling constraints. This triggers Cluster Autoscaler to start expansion for the cilium-operator pod in Pending status.

Solution

Adjust the number of pods of the cilium-operator.

$ kubectl scale --replicas=1 deploy/cilium-operator -n kube-system

However, operating only one worker node requires extra caution due to the potential decrease in stability.

When no new nodes are created even though the total number of nodes is still below the maximum limit

When using Cluster Autoscaler, I cannot create new nodes by adjusting the minimum number of nodes, even though the cluster has not yet reached the maximum limit.

Cause

Once Cluster Autoscaler is enabled it does not automatically add nodes when you adjust the minimum number of nodes.

Solution

Disable Cluster Autoscaler and manually increase the number of static nodes.

High resource usage by worker node

I can see abnormally high resource usage, such as CPU and memory, in the worker node.

Cause

The main causes of high resource usage are as follows:

Excessive pod deployment: When the number of pods deployed in the worker node is too high
Resource limits not set: When there is no Resource Request and Limit for pods
Memory leak: When memory leak occurs in the operational pod

Solution

Check pod deployment status

Run the following command to investigate nodes and pods with high resource usage and adjust them within the cluster as necessary:
```
$ kubectl top nodes
$ kubectl top pods
```
Access the worker node and check its memory usage status. Check used and buff/cache values. As there might be a problem when the buff/cache usage is high, you need to check.
```
$ free -h
```

Resource Request and Limit settings

Set resource limits for the operational pod to prevent excessive usage.
Resource Management for Pods and Containers

Note

If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.