Troubleshoot common issues

Available in VPC

You might run into the following problems when using Ncloud Kubernetes Service. Find out causes and possible solutions.

Cluster anomaly

Ncloud Kubernetes Service clusters remain in progress for a long period of time.

Cause

There may be various causes depending on your task environment.

Solution

If a task remains in progress for a long period of time during cluster creation or deletion, node pool scale-in or scale-out, or a cluster upgrade, contact Customer support.

Cilium-Operator pod in Pending status after creating cluster

After I created a cluster in Ncloud Kubernetes Service, the Cilium-Operator pod is Pending.

Cause

For stable operation of Cilium, which is CNI, the number of operators is set to 2 for Ncloud Kubernetes Service. Therefore, in the case of a cluster with 1 worker node, 1 Cilium operator gets to be at the Pending status. This status is intended by design and does not affect the actual operation of Cilium.

Solution

Adjust the scale of Cilium-Operator or increase the number of worker nodes.

Worker node in Not Ready status

Worker nodes of Ncloud Kubernetes Service is in Not Ready status.

Cause

There may be various causes, but the problem occurs largely due to high resource usage of the worker node.

Solution

You can first check the resource usage by the worker node and then schedule the pod located in the worker node to a different worker node or reduce the scale to decrease the resource usage. After that, restart the worker node and check if the status changes to Ready.

To prevent worker node problems due to high resource usage, it is recommended to use the following Kubernetes features:

As for the server widget provided by default in Cloud Insight, buff/cache is not considered when memory usage is monitored. Therefore, there may be a difference with the actual usage. As buff/cache does not return when the memory usage is high, Out of Memory (OOM) is likely to occur. Consider this when you proceed with monitoring.

Pod in Evict status created

A pod in Evict status is created or being created.

Cause

The Evict status means that Kubelet stops the pod. Kubelet, a component of Kubernetes, monitors the node resource. If a node resource under monitoring reaches the threshold, Kubelet stops the pod to retrieve the resource.

Solution

If a pod has the Evict status, you must monitor the node resource and bring it down below the threshold.
Thresholds in Ncloud Kubernetes Service are in accordance with the basic settings of Kubernetes. For more information, see Node-pressure Eviction.

Cannot access Kubernetes Cluster through ncp-iam-authenticator

I cannot access Kubernetes Cluster through ncp-iam-authenticator.

Cause

When using ncp-iam-authenticator, the cause and solution may vary depending on the troubleshoot error messages.

Solution

The causes and solutions of each error message are as follows:

Cluster is undefined (400): occurs when an incorrect cluster UUID is entered.
Authentication Failed (200): occurs when you use an incorrect authentication key value. ncp-iam-authenticator uses the authentication key value under ~/.ncloud/configure. You must check whether the authentication key is expired, deleted, disabled or incorrect.
Not Found Exception (404): check the value of ncloud_api_url and region.
You must be logged in to the server (Unauthorized): occurs when the access account does not have the related permissions in the cluster. Make permission settings for the access account in IAM authentication and user management.

Kubernetes dashboard access errors

When I access Kubernetes dashboard, I cannot view the resource.
I cannot access the Kubernetes dashboard with a sub account.
"Unauthorized" appears on the Kubernetes dashboard.

Cause

Dashboard (Kubernetes Dashboard) in Kubernetes Service provides the same access permissions as the cluster permissions. To use the dashboard, the access account must have suitable cluster usage permissions.

Solution

See IAM authentication and user management to grant the cluster permissions to the access account.

Cannot change PVC size

I cannot change the PVC size.

Cause

When the volume to resize is being used, you cannot change the size.

Solution

Adjust the replica value of the worker node to 0.
You must reduce the replica value of the worker node that uses the storage integrated to the applicable PVC to 0 for Storages to become in the Available status. Only when storages are in the Available status, you can resize the PVC volume.

Cannot change Storage Class settings

I cannot change Storage Class settings.

Cause

The Storage Class already created cannot be changed.

Solution

Create and use a new Storage Class with the necessary settings. You can see the Storage Class' basic specifications provided by Ncloud Kubernetes Service.

Block Stroage

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nks-block-storage
parameters:
  type: SSD
provisioner: blk.csi.ncloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nks-nas-csi
mountOptions:
  - hard
  - nolock
  - nfsvers=3
provisioner: nas.csi.ncloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsume

GPU node resources not recognized

GPU resources on GPU nodes created in Ncloud Kubernetes Service cluster are not recognized.

Cause

To use GPU resources properly, you need to install the NVIDIA Device Plugin.

Solution

To install the NVIDIA Device Plugin, see GPU node guide.
Once the plugin is installed, the GPU resources will be recognized properly.

Note

If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.