Troubleshooting Ncloud Kubernetes Service

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

This document details problematic situations users may face while using Ncloud Kubernetes Service, as well as their causes and resolutions. However, some problems may be difficult for users to handle themselves, even with the following information at hand. To efficiently address these user difficulties, NAVER Cloud Platform also provides various channels to resolve them. If you are unable to resolve a problem using the troubleshooting guide, you can request help from the Customer Center.

Setting up and managing Ncloud Kubernetes Service

Q. Ncloud Kubernetes Service clusters maintain tasks that are in progress for a long period of time.

A. If a cluster stays too long at the task-in-progress status during cluster creation or deletion, node pool scale in/out, cluster upgrade and so on, you can find the solution for the issue at Customer inquiry on the NAVER Cloud Platform portal.

Q. Cilium-Operator pod is pending after I created a cluster in Ncloud Kubernetes Service.

A. For stable operation of the CNI Cilium, Ncloud Kubernetes Service has the number of operators set to 2. Therefore, in the case of a cluster with 1 worker node, 1 Cilium operator gets to be at the pending status. This status is intended by design and does not affect the actual operation of Cilium. You can adjust the scale of Cilium-Operator or increase the number of worker nodes to resolve your issue.

Q. Worker node of Ncloud Kubernetes Service is in Not Ready status.

A. Worker node can be in Not Ready status for various reasons. Most of them are related to high resource usage by the worker node. You can first check the resource usage by the worker node and then schedule the pod located in the worker node to a different worker node or reduce the schedule to decrease the usage. After that, restart the worker nodes and check if the status changes to Ready.
To prevent worker node problems due to high resource usage, use the following Kubernetes features.

Q. A pod has been created or is being created that has the Evict status.

A. kubelet, a component of Kubernetes, are monitoring resources of nodes. If a node resource under monitoring reaches the threshold, kubelet proactively stops the pod to retrieve the resource. The Evict status means that the kubelet has stopped the pod. If a pod has the Evict status, you need to monitor the node resource and bring it down below the threshold. Thresholds in Ncloud Kubernetes Service are in accordance with the basic settings of Kubernetes. For more information, see the following official documentation.

Node-pressure Eviction

Q. What is the policy concerning the version difference between control plain and worker node after cluster upgrade?

Ncloud Kubernetes Service follows the policy of Kubernetes concerning version difference. Kubernetes allows a certain extent of difference between control plain and worker node versions. For more information on the policy, see Official documentation.

Deploying Ncloud Kubernetes Service

Q. ImagePullBackOff occurs while I create a pod.

A. ImagePullBackOff occurs because the pod can't load the image for it to use. You can refer to the following to identify the cause and resolve the issue.

An image cannot be imported if the image name and tag are incorrect.
If you are bringing an image from a private registry, you cannot bring the image without going through an authentication process. You need to check if you have used the correct imagePullSecrets. If you are using a Container Registry product, you can create imagePullSecrets by referring to User guide.
In the case of a Kubernetes cluster that uses a private subnet, you need to have the outbound traffic enabled to use an external image. If it is disabled, you can enable it as follows:
- NAT Gateway (Old)
  - Go to Console > VPC > NAT Gateway (Old) and create NAT Gateway
  - Go to Console > Products & Services > Networking > VPC > Route Table
  - Select the route table of private subnet that requires internet communication, and click the [Routes Settings] button
  - Add the route rule for external communication
    - Destination: enter the public IP address of the destination in the CIDR format. (For example, enter 0.0.0.0/0 if the entire internet is the destination)
    - Target Type: select the next hop type for communication with the destination (NAT Gateway)
    - Target Name: select the name of the NAT Gateway created
    - Click [Creation] to add and apply the rule
- NAT Gateway (New)
  - Go to Console > VPC > NAT Gateway (New) and create a public NAT Gateway
  - Go to Console > VPC > Route Table and set the network path for communication through NAT Gateway
    - Refer to Route Table setting guide to make the setting
You may not be able to use certain images that are restricted by DockerHub's Image download policy. To use such images, you need to use a paid account for large-quantity use or a private registry.

Q. DNS viewing intermittently fails or takes a long time at some pods.

A. If DNS viewing fails or query processing has a problem, you need to check the image used by the applicable pod.

Alpine Linux-based container image

Apply dnsConfig to resolve the issue

spec:
...
  dnsConfig:
    options:
    - name: single-request-reopen

BusyBox-based container image
- Use a busybox 1.28-based image to resolve the issue

Using the above two images may cause issues in DNS query processing. NAVER Cloud Platform is unable to help resolve technical issues concerning images, so for such issues you need to use other container images or edit the DNS settings.

Ncloud Kubernetes Service networking

Q. Target groups exist that were not created.

A. If spec.defaultBackend within Ingress Manifest mapped with ALB is not assigned, a target group set as port 80 of the worker node group is created. Such target groups are created on purpose and do not affect ALB's operation.

Q. Load Balancers created through Ncloud Kubernetes Service clusters cannot be deleted.

A. Load Balancers registered in Global DNS cannot be deleted. Check whether the Load Balancer is registered in Global DNS and, if it is, you need to deregister it to be able to delete it.
Changes are to be made to make improvements in this regard.

Q. NetworkPolicy is not operating properly.

A. When Cilium CNI is used, there are the following known unsupported functions among the functions of NetworkPolicy.

Q. Cilium fails connectivity test.

A. Cilium provides a Connectivity Test set for testing the network status. If you run the test set as is on Ncloud Kubernetes Service, the following two tests are bound to fail.

This is an issue known to occur when node-local-dns bound to link-local ip is used. For proper DNS resolving, you need to add the IP band used by node-local-dns to the toCIDR field in CiliumNetworkPolicy.

# pod-to-a-allowed-cnp
  egress:
  - toPorts:
    - ports:
      - port: "53"
        protocol: ANY
    toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toCIDR:
    - 169.254.25.10/32

# pod-to-external-fqdn-allow-google-cnp
  egress:
  - toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: '*'
    toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toCIDR:
    - 169.254.25.10/32

Q. I have created a service resource of LoadBalancer type, but it is maintaining the Pending status.

A. If the Load Balancer subnet selected during cluster creation has been deleted or if there is no IP address available for assignment, External-IP cannot be assigned. You can change the basic Load Balancer subnet or select a different Load Balancer subnet in the following way:

You can change the basic Load Balancer subnet as follows:
- Run the following command to check the configmap that has the name ncloud-config in kube-system Namespace.
```
$ kubectl --kubeconfig $KUBE_CONFIG get configmap ncloud-config -n kube-system
$ kubectl --kubeconfig $KUBE_CONFIG get configmap ncloud-config -n kube-system -o yaml
```
- Refer to the following to check the configmap that has the name ncloud-config in kube-system Namespace and edit it.
```
$ kubectl --kubeconfig $KUBE_CONFIG get configmap ncloud-config -n kube-system
  NAME            DATA   AGE
  ncloud-config   9      131m

$ kubectl --kubeconfig $KUBE_CONFIG get configmap ncloud-config -n kube-system -o yaml
  apiVersion: v1
  kind: ConfigMap
  metadata:
    name: ncloud-config
    namespace: kube-system
  data:
    acgNo: "12345"
    apiUrl: https://nks.apigw.ntruss.com
    basePath: /ncloud-api/v1
    lbPublicSubnetNo: "98765"
    lbSubnetNo: "12345"
    regionCode: KR
    regionNo: "11"
    vpcNo: "12345"
    zoneNo: "110"
```
- data.acgNo: enter instanceID of acg assigned to the eth0 interface of the worker node.
- data.apiUrl: enter https://nks.apigw.ntruss.com.
- data.basePath: enter /ncloud-api/v1.
- data.lbPublicSubnetNo: enter SubnetID of the public subnet dedicated for load balancer in VPC where the worker node is assigned.
- data.lbSubnetNo: enter SubnetID of the private subnet dedicated for load balancer in VPC where the worker node is assigned.
- data.regionCode: enter the region code where the worker node is located. (e.g., "FKR")
- data.regionNo: enter the region number of the worker node. (e.g. "11")
- data.vpcNo: enter the VPC ID of the VPC to which the worker node is assigned.
- data.zoneNo: enter the zone number where the worker node is located. (e.g. "110")
- Run the following command to change the Load Balancer subnet.
```
$ kubectl --kubeconfig $KUBE_CONFIG -n kube-system patch configmap ncloud-config --type='json' -p='[{"op":"replace", "path":"/data/lbSubnetNo", "value": "94465"}]'
```
- The above is an example command for a subnet ID of 94465.
- Use the SubnetID of the Load Balancer subnet in the VPC where the Kubernetes worker node is created. If you use an invalid SubnetID, Load Balancer cannot be properly created after the subnet is changed.
- After the change is completed, run the following command to check whether the change has been applied to configmap.
```
$ kubectl --kubeconfig $KUBE_CONFIG get configmap ncloud-config -n kube-system -o yaml
```

Q. Application Load Balancer (ALB) connected with Ingress does not operate or cannot be properly created.

A. This happens with ALB created through ALB Ingress Controller. Run the following command to view the relevant resource and log.

$ kubectl --kubeconfig $KUBE_CONFIG logs -n kube-system -l app.kubernetes.io/name=alb-ingress-controller
$ kubectl --kubeconfig $KUBE_CONFIG describe ingress [ingress-name]

If no ALB Ingress Controller exists, ALB cannot be created. You can install ALB Ingress Controller by referring to ALB Ingress Controller setting guide.
If Ingress Manifest has an error, ALB or relevant tools may not be created properly. You can check the cause on the pod log of ALB Ingress Controller.
- If the rule inside Manifest is inconsistent with the rule of the created ALB
- If an invalid annotation is used
- If the service name or port written in the Manifest rule is incorrect
- If the rules are in an incorrect order (The topmost rule is applied first.)
If you change the created ALB through the console or API, a problem may occur. ALB created through ALB Ingress Controller needs to be regularly synchronized with Manifest of Ingress registered in Kubernetes Cluster. Refrain from making changes through the console or API whenever possible, but if change has been made through such means, you need to reapply Manifest to synchronize the states.

Q. I want to view the client IP.

A. Load Balancers that can be created through Ncloud Kubernetes Service are NetworkLoadBalancer (NLB), NetworkProxyLoadBalancer (NPLB), and ApplicationLoadBalancer (ALB). The client IP can be viewed in different ways depending on the Load Balancer type.

ALB: since this type uses the HTTP/HTTPS protocol, you can view the client IP through the X-Forwarded-For header in the application.
NPLB: you can enable proxy-protocol to view the client IP. For this, you need to have the service details annotated and have the proxy-protocol settings enabled in the application. For example, if you are using nginx-ingress-controller, you need to have the proxy-protocol setting of nginx-ingress-controller enabled.
NLB: the client IP can be displayed in the Load Balancer, but for this you need to make certain settings in the cluster. In the service details, change "externalTrafficPolicy" to "Local" to be able to see the client IP. For more information on "externalTrafficPolicy," see Official documentation.

Q. Packet drop occurs in the cluster after I have changed kernel parameters.

A. When Cilium, the CNI of Ncloud Kubernetes Service, is started, "rp_filter" is dynamically disabled. Since the sysctl.conf file settings are not changed, you need to take caution when applying the sysctl.conf file to change specific kernel parameters of the worker node. If the changed parameters are inconsistent with the Cilium settings, packet drop may occur. Before you change the syctl.conf file and apply the change, you need to first edit the rp_filter setting "net.ipv4.conf.all.rp_filter" to "0." There are other reasons for caution when you change kernel parameters, and you can use customer inquiry in NAVER Cloud Platform's portal when you are unsure.

Q. I want to change the target group settings when creating ALB.

A. You can change the Service specifications set for Ingress's backend. Using annotations, you can set the load balancing algorithm and health check of the target group created through Service.
For example, the following setting performs load balancing through the least-connection algorithm for the target groups selected through the "naver" Service.

apiVersion: v1
kind: Service
metadata:
  annotations: 
    alb.ingress.kubernetes.io/algorithm-type: "least-connection"
  name: naver

For annotations available in Service and Ingress, see ALB Ingress Controller setting.

Ncloud Kubernetes Service security

Q. I cannot access Kubernetes cluster through ncp-iam-authenticator.

A. You can analyze the cause through the error message that appear while you use ncp-iam-authenticator.

Cluster is undefined (400): occurs when you enter an incorrect cluster UUID.
Authentication Failed (200): occurs when you use an incorrect authentication key value. ncp-iam-authenticator uses the authentication key value under ~/.ncloud/configure. You need to check whether the authentication key is expired, deleted, disabled or incorrect.
Not Found Exception (404): occurs when ncloud_api_url or region has an incorrect value.
You must be logged in to the server (Unauthorized): occurs when the account (sub account) does not have the relevant permissions in the cluster. When you create a cluster, the main account and cluster creation account are automatically set to the system:masters group, but other accounts require separate permissions settings. Make permission settings for the accounts by referring to IAM authentication and user management.

Ncloud Kubernetes Service expandability and performance

Q. Cluster AutoScaler is enabled but Scale Up/Down does not occur.

A. If there exists any pod that could not be scheduled due to lack of resources in the Cluster Autoscaler cluster provided in Ncloud Kubernetes Service, nodes are increased. If certain nodes show poor usage rate for a certain amount of time, nodes are decreased. Cluster Autoscaler may not operate for various reasons.

It does not operate if the pod in use is missing Resource Request and Limit settings. Cluster Autoscaler requires you to make the Resource Request&Limit setting for operation.
It does not operate if there is no HPA (Horizontal Pod Autoscaling) setting. Pods increase through HPA, and Cluster Autoscaler operates through the resource request amount of the pods created. You can make Cluster Autoscaler operate by making HPA settings.
If there exist exceptions to node decrease, nodes may not decrease. The following exceptions may exist:
- Controllers (e.g., Deployment, StatefulSet) do not control nodes
- Local Storage is enabled
- Pod cannot move to another node
- Annotation "cluster-autoscaler.kubernetes.io/safe-to-evict" is set to "false."
- For other exceptions, see Official documentation.

Q. I want to remove certain nodes while reducing the node pool manually.

A. When you reduce the node pool manually, priorities are given to nodes with the following conditions.

Stopped nodes
Nodes with earlier creation date

If you wish to remove a certain node, go to Kubernetes Service > Node pools > Node Pool Details > Delete Nodes or stop the node first and edit the number of nodes.

Q. I want to change or view parameters when using Cluster AutoScaler.

A. Cluster AutoScaler provides default parameters for settings, and the parameters are not editable. You can view the default parameters used in Cluster AutoScaler in the following documentation.

What are the parameters to CA?

Ncloud Kubernetes Service monitoring and logging

Q. Resources cannot be viewed on Kubernetes Dashboard.

A. Dashboard (Kubernetes Dashboard) in Ncloud Kubernetes Service provides the same access permissions as the cluster permissions. The main account and the cluster creation account (sub account) on NAVER Cloud Platform provide the administrator permission, which grants access to Dashboard without additional settings. However, other sub accounts need to have the cluster use permission to be able to access Dashboard.

For more information on how to assign the cluster permission, see the guide documentation.

IAM authentication and user management

Was this article helpful?

What's Next

Ncloud Kubernetes Service resource management

Table of contents

Setting up and managing Ncloud Kubernetes Service
Deploying Ncloud Kubernetes Service
Ncloud Kubernetes Service networking
Ncloud Kubernetes Service security
Ncloud Kubernetes Service expandability and performance
Ncloud Kubernetes Service monitoring and logging