Available in VPC
You might run into the following problems when using Ncloud Kubernetes Service. Find out causes and possible solutions.
Cluster anomaly
Ncloud Kubernetes Service clusters remain in progress for a long period of time.
Cause
There may be various causes depending on your task environment.
Solution
If a task remains in progress for a long period of time during cluster creation or deletion, node pool scale-in or scale-out, or a cluster upgrade, contact Customer support.
Cilium-Operator pod in Pending status after creating cluster
After I created a cluster in Ncloud Kubernetes Service, the Cilium-Operator pod is Pending.
Cause
For stable operation of Cilium, which is CNI, the number of operators is set to 2 for Ncloud Kubernetes Service. Therefore, in the case of a cluster with 1 worker node, 1 Cilium operator gets to be at the Pending status. This status is intended by design and does not affect the actual operation of Cilium.
Solution
Adjust the scale of Cilium-Operator or increase the number of worker nodes.
Worker node in Not Ready status
Worker nodes of Ncloud Kubernetes Service is in Not Ready status.
Cause
There may be various causes, but the problem occurs largely due to high resource usage of the worker node.
Solution
You can first check the resource usage by the worker node and then schedule the pod located in the worker node to a different worker node or reduce the scale to decrease the resource usage. After that, restart the worker node and check if the status changes to Ready.
To prevent worker node problems due to high resource usage, it is recommended to use the following Kubernetes features:
As for the server widget provided by default in Cloud Insight, buff/cache is not considered when memory usage is monitored. Therefore, there may be a difference with the actual usage. As buff/cache does not return when the memory usage is high, Out of Memory (OOM) is likely to occur. Consider this when you proceed with monitoring.
Pod in Evict status created
A pod in Evict status is created or being created.
Cause
The Evict status means that Kubelet stops the pod. Kubelet, a component of Kubernetes, monitors the node resource. If a node resource under monitoring reaches the threshold, Kubelet stops the pod to retrieve the resource.
Solution
If a pod has the Evict status, you must monitor the node resource and bring it down below the threshold.
Thresholds in Ncloud Kubernetes Service are in accordance with the basic settings of Kubernetes. For more information, see Node-pressure Eviction.
Cannot access Kubernetes Cluster through ncp-iam-authenticator
I cannot access Kubernetes Cluster through ncp-iam-authenticator.
Cause
When using ncp-iam-authenticator, the cause and solution may vary depending on the troubleshoot error messages.
Solution
The causes and solutions of each error message are as follows:
- Cluster is undefined (400): occurs when an incorrect cluster UUID is entered.
- Authentication Failed (200): occurs when you use an incorrect authentication key value. ncp-iam-authenticator uses the authentication key value under
~/.ncloud/configure. You must check whether the authentication key is expired, deleted, disabled or incorrect. - Not Found Exception (404): check the value of
ncloud_api_urlandregion. - You must be logged in to the server (Unauthorized): occurs when the access account does not have the related permissions in the cluster. Make permission settings for the access account in IAM authentication and user management.
Kubernetes dashboard access errors
- When I access Kubernetes dashboard, I cannot view the resource.
- I cannot access the Kubernetes dashboard with a sub account.
- "Unauthorized" appears on the Kubernetes dashboard.
Cause
Dashboard (Kubernetes Dashboard) in Kubernetes Service provides the same access permissions as the cluster permissions. To use the dashboard, the access account must have suitable cluster usage permissions.
Solution
See IAM authentication and user management to grant the cluster permissions to the access account.
Cannot change PVC size
I cannot change the PVC size.
Cause
When the volume to resize is being used, you cannot change the size.
Solution
Adjust the replica value of the worker node to 0.
You must reduce the replica value of the worker node that uses the storage integrated to the applicable PVC to 0 for Storages to become in the Available status. Only when storages are in the Available status, you can resize the PVC volume.
Cannot change Storage Class settings
I cannot change Storage Class settings.
Cause
The Storage Class already created cannot be changed.
Solution
Create and use a new Storage Class with the necessary settings. You can see the Storage Class' basic specifications provided by Ncloud Kubernetes Service.
- Block Stroage
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nks-block-storage
parameters:
type: SSD
provisioner: blk.csi.ncloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
- NAS
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nks-nas-csi
mountOptions:
- hard
- nolock
- nfsvers=3
provisioner: nas.csi.ncloud.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsume
GPU node resources not recognized
GPU resources on GPU nodes created in Ncloud Kubernetes Service cluster are not recognized.
Cause
To use GPU resources properly, you need to install the NVIDIA Device Plugin.
Solution
To install the NVIDIA Device Plugin, see GPU node guide.
Once the plugin is installed, the GPU resources will be recognized properly.
If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.