Using Cluster Autoscaler

Prev Next

Available in VPC

Kubernetes provides two features for cluster autoscaling and resource management: Cluster Autoscaler and Vertical Pod Autoscaler. Cluster Autoscaler automatically expands or reduces worker nodes in a cluster. This allows for more flexible and effective cluster resource management.

This guide walks you through the features and usage of Cluster Autoscaler in Ncloud Kubernetes Service.

Cluster Autoscaler

Cluster Autoscaler automatically adjusts the size of a Kubernetes cluster to optimize resource usage. When there is a pod that cannot be scheduled due to a lack of resource, the cluster is expanded. If some of the nodes in the cluster constantly show low resource usage, those nodes are reduced to prevent resource waste. This process is automatically performed based on the requirements from the pods and the current resource status of the cluster.

To operate Cluster Autoscaler, Pod resource requests and limits should be defined in advance. This information is crucial for Cluster Autoscaler to analyze the resource usage rate of the cluster and adjust the size of the cluster as necessary. Resource requests specifies the minimum resource necessary for a pod to start, and resource limits restricts the maximum resource available for the pod.

Operation of the Cluster Autoscaler

Cluster Autoscaler analyzes the resource usage rate of a cluster and adjusts the size of the cluster as necessary. This process is automatically performed based on the requirements from the pods and the current resource status of the cluster.

  • Scaleup conditions
    • If the newly created pod has been in Pending status for more than a certain amount of time, and the maximum number of nodes in the current cluster and the maxSize limit of the node pool have not yet been reached, nodes are added.
    • Cluster Autoscaler determines whether to add nodes based on resource requests (Requests) from the pending pod.
  • Scaledown conditions
    • If some of the nodes in the cluster constantly show low resource usage, those nodes are reduced..
    • A node is considered underutilized when the aggregate of CPU/Memory requests is less than the scale-down-utilization-threshold (default: 0.5) relative to the node's allocatable resources.
    • By default, during --scale-down-unneeded-time (default: 10 minutes), if the node is not requested by other pods, it is registered as a candidate for reduction.
    • Cluster Autoscaler then verifies that the pod can be safely moved to another node, and then removes the node.
    • However, nodes are excluded from reduction in the following cases:
      • When nodes are not controlled by a controller, such as Deployment and StatefulSet
      • When Local Storage is enabled
      • When the pod cannot move to another node
      • When the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict" is set to "false"

For more information on Cluster Autoscaler, see Cluster Autoscaler FAQs

Enabling Cluster Autoscaler

For Ncloud Kubernetes Service clusters, Cluster Autoscaler is disabled by default. You can easily enable this function using the node pool settings on your console to automatically adjust the cluster size and optimize resource usage.

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Containers > Ncloud Kubernetes Service > Clusters.
  2. Select the node pool for which you want to enable Cluster Autoscaler.
  3. Click [Edit] on the top.
  4. Click [Settings] and enter the minimum and maximum number of nodes.
Note
  • The total number of cluster worker nodes includes the maximum number of nodes in Cluster Autoscaler. Clusters should be appropriately managed by taking this factor into consideration.
  • In the initial settings, the number of nodes are not automatically increased even if the current number of nodes is smaller than the minimum number of nodes previously set. If necessary, the number of nodes should be adjusted manually before the setup.
  1. Click [Edit] to enable Cluster Autoscaler.
  2. Use kubectl to check the operation of Cluster Autoscaler.
    $ kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
    ...
    data:
    status: |
        Cluster-autoscaler status at 2024-02-20 07:04:01.141727455 +0000 UTC:
        Cluster-wide:
        Health:      Healthy (ready=5 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=5 longUnregistered=0)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        ScaleUp:     NoActivity (ready=5 registered=5)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        ScaleDown:   NoCandidates (candidates=0)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        
        NodeGroups:
        Name:        node
        Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=8))
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:13:49.540592255 +0000 UTC m=+2242341.682313956
        ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:25:11.920272102 +0000 UTC m=+2243024.061993805
        ScaleDown:   NoCandidates (candidates=0)
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:25:11.920272102 +0000 UTC m=+2243024.061993805
    ...
    
Note

Consider the following when using Cluster Autoscaler.

  • Starting or stopping the Cluster Autoscaler feature takes approximately 1 to 5 minutes.
  • As the number of nodes cannot be manually changed while using Autoscaler, set the function to Not set to manually change the number of nodes.
  • Cluster Autoscaler only applies to node pools within a cluster that have the feature enabled.

Cluster Autoscaler examples

Using Cluster Autoscaler and Horizontal Pod Autoscaler, you can conduct a load test in your Kubernetes cluster. The test allows you to check the dynamic adjustment of the number of pods and nodes in your cluster.

You can find more information on this example from Horizontal Pod Autoscaler Walkthrough.

  1. Get the cluster ready for Cluster Autoscaler testing.

  2. Check if the Cluster Autoscaler and Metrics Server are operating normally.

    • Check the Cluster Autoscaler
    $ kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
    ...
    data:
    status: |
        Cluster-autoscaler status at 2024-02-20 07:04:01.141727455 +0000 UTC:
        Cluster-wide:
        Health:      Healthy (ready=5 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=5 longUnregistered=0)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        ScaleUp:     NoActivity (ready=5 registered=5)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        ScaleDown:   NoCandidates (candidates=0)
                    LastProbeTime:      2024-02-20 07:04:01.121214914 +0000 UTC m=+2227353.262936722
                    LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
        
        NodeGroups:
        Name:        node
        Health:      Healthy (ready=2 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=2 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=8))
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:13:49.540592255 +0000 UTC m=+2242341.682313956
        ScaleUp:     NoActivity (ready=2 cloudProviderTarget=2)
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:25:11.920272102 +0000 UTC m=+2243024.061993805
        ScaleDown:   NoCandidates (candidates=0)
                    LastProbeTime:      2024-02-20 11:25:31.98194832 +0000 UTC m=+2243044.123670024
                    LastTransitionTime: 2024-02-20 11:25:11.920272102 +0000 UTC m=+2243024.061993805
    ...
    
    • Check the Metrics Server
    $ kubectl top pods -n kube-system
    NAME                                      CPU(cores)   MEMORY(bytes)           
    cilium-mct9p                              7m           96Mi            
    cilium-operator-846758784c-t2hpj          2m           21Mi            
    cilium-qqwdl                              6m           88Mi            
    cilium-sspl4                              6m           88Mi
    ...
    
  3. Deploy php-apache deployment in the cluster.

    $ kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: php-apache
    spec:
    selector:
        matchLabels:
        run: php-apache
    replicas: 1
    template:
        metadata:
        labels:
            run: php-apache
        spec:
        containers:
        - name: php-apache
            image: registry.k8s.io/hpa-example
            ports:
            - containerPort: 80
            resources:
            limits:
                cpu: 500m
            requests:
                cpu: 200m
    ---
    apiVersion: v1
    kind: Service
    metadata:
    name: php-apache
    labels:
        run: php-apache
    spec:
    ports:
    - port: 80
    selector:
        run: php-apache
    
  4. Create and check the Horizontal Pod Autoscaler (HPA).

    • Create HPA
    $ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
    
    • Check HPA
    $ kubectl get hpa
    NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    php-apache   Deployment/php-apache   0%/50%    1         10        1          144m
    
  5. Generate a load to check an increase in the number of pods.

    • Increase the load in a separate terminal
    $ kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
    
    • Observe HPA and confirm an increase in the number of replicas
    $ kubectl get hpa
    NAME         REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
    php-apache   Deployment/php-apache   106%/50%   1         10        7          154m
    
  6. Check the Pending status of newly created pods and check the operation of the Cluster Autoscaler.

    • Check the Pending status of pods
    $ kubectl get pods -o wide | grep php-apache
    ...
    php-apache-78c9f8cbf6-4n794          0/1     Pending   0             20s
    php-apache-78c9f8cbf6-c8xdc          0/1     Pending   0             24s
    php-apache-78c9f8cbf6-t75w2          0/1     Pending   0             24s
    php-apache-78c9f8cbf6-wc98s          1/1     Running   0             24s
    ...
    
    • Check the operation of the Cluster Autoscaler
    $ kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
    ...
    Cluster-wide:
      Health:      Healthy (ready=3 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=3 longUnregistered=0)
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
      ScaleUp:     InProgress (ready=3 registered=3)
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-02-20 11:14:29.611869206 +0000 UTC m=+2242381.753590773
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-01-25 12:22:06.170599081 +0000 UTC m=+38.312320847
    
    NodeGroups:
      Name:        node
      Health:      Healthy (ready=1 unready=0 (resourceUnready=0) notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=8))
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-02-20 11:13:49.540592255 +0000 UTC m=+2242341.682313956
      ScaleUp:     InProgress (ready=1 cloudProviderTarget=2)
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-02-20 11:14:29.611869206 +0000 UTC m=+2242381.753590773
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2024-02-20 11:15:50.82048015 +0000 UTC m=+2242462.962201862
                   LastTransitionTime: 2024-02-20 11:13:49.540592255 +0000 UTC m=+2242341.682313956
    ...
    
  7. Check the expansion of worker nodes and the scheduling of pods.

    • View worker node expansion
    $ kubectl get nodes
    NAME                 STATUS            ROLES    AGE    VERSION
    node-1               Ready             <none>   13m    v1.27.9
    node-2               Ready             <none>   2m8s   v1.27.9
    node-3               Ready             <none>   88d    v1.27.9
    node-4               Ready             <none>   15d    v1.27.9
    
    • Check the Running status of pods
    $ kubectl get pods | grep php-apache
    
  8. Once the load is stopped, the number of pods is reduced. Next, Cluster Autoscaler observes the resource usage of the worker nodes and reduces their number as necessary to reduce the size of the cluster.

Note
  • As the result may vary depending on the specifications of the nodes selected during cluster configuration, you should factor in the specifications before the test.
  • Even after the load test is stopped, specific conditions may hinder immediate node reduction. For example, if an important system pod is running in the node or if there is a pod using a local storage, node reduction may be limited.
  • As the expansion and reduction of the cluster using the load test may be different from the actual settings, further review of the test result is required prior to real-world application.

Troubleshooting

For issues you run into while using Cluster Autoscaler, see Ncloud Kubernetes Service troubleshooting - scalability and performance problems