Horizontal and Vertical Autoscaling in Kubernetes

In this article we are going to cover Kubernetes Autoscaling: HPA and VPA | Horizontal and Vertical Autoscaling in Kubernetes with Examples.

Kubernetes autoscaling helps manage workloads efficiently by ensuring applications get the right amount of resources while minimizing costs. There are two main autoscaling mechanisms:

Horizontal Pod Autoscaler (HPA): Scales the number of pods based on CPU, memory, or custom metrics.
Vertical Pod Autoscaler (VPA): Adjusts CPU and memory resource requests for existing pods dynamically.

This guide provides a step-by-step approach to setting up and using HPA and VPA in Kubernetes.

Table of Contents

Prerequisites

Before starting, ensure you have:

A running Kubernetes cluster (Minikube, EKS, GKE, AKS, etc.)
kubectl installed and configured
Basic knowledge of Kubernetes Deployments and Pods

Step #1:Install Horizontal Pod Autoscaler (HPA)

HPA requires the Metrics Server to collect CPU and memory usage data.

Install Metrics Server on Minikube

minikube addons enable metrics-server

Horizontal and Vertical Autoscaling in Kubernetes 1

Verify the installation:

kubectl get deployment metrics-server -n kube-system

Horizontal and Vertical Autoscaling in Kubernetes 2

Step #2:Install Vertical Pod Autoscaler (VPA)

VPA requires specific components to function properly.

1. Clone the VPA Repository

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

Horizontal and Vertical Autoscaling in Kubernetes 3

2. Deploy the VPA Components

./hack/vpa-up.sh

Horizontal and Vertical Autoscaling in Kubernetes 4

3. Verify the Installation

kubectl get pods -n kube-system | grep vpa

Horizontal and Vertical Autoscaling in Kubernetes 5

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of replicas of a deployment based on CPU or memory usage.

Step #1:Deploy a Sample Application

Create a deployment with CPU requests/limits:

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "250m"
          limits:
            cpu: "500m"

Apply it:

kubectl apply -f deployment.yaml

Horizontal and Vertical Autoscaling in Kubernetes 6

Horizontal and Vertical Autoscaling in Kubernetes 7

Step #2:Expose the Deployment

Create a Service to access the Nginx pods.

kubectl expose deployment nginx-deployment --type=NodePort --port=80

Check the Service:

kubectl get svc nginx-deployment

Horizontal and Vertical Autoscaling in Kubernetes 8

Step #3:Create a Horizontal Pod Autoscaler (HPA)

Create a file hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-deployment
  minReplicas: 2
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 30   # Default is 300s (5 min), lower it to 30s
      policies:
      - type: Percent
        value: 50   # Reduce pods by 50% at a time
        periodSeconds: 30  # Every 30 seconds

Apply the HPA:

kubectl apply -f hpa.yaml

Check the HPA:

kubectl get hpa

Horizontal and Vertical Autoscaling in Kubernetes 9

Step #4:Generate Load to Test Auto-Scaling

To see auto-scaling in action, generate CPU load:

kubectl run --rm -it --image=busybox load-generator -- /bin/sh

Horizontal and Vertical Autoscaling in Kubernetes 10

Inside the pod, run:

while true; do wget -q -O- http://nginx-deployment; done

Horizontal and Vertical Autoscaling in Kubernetes 11

Check if HPA scales the pods:

kubectl get hpa
kubectl get pods

When the cpu hits the target new pod got created as you can check it in the below image:

Horizontal and Vertical Autoscaling in Kubernetes 12

When the load on cpu gets lower the newly created pods will be deleted as shown in below image:

Horizontal and Vertical Autoscaling in Kubernetes 13

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU/memory requests of a running pod based on usage.

Step #1:Create a VPA Resource

Define a VPA for the nginx-deployment:

vpa.yaml:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: nginx-deployment
  updatePolicy:
    updateMode: "Auto"

Apply it:

kubectl apply -f vpa.yaml

Check the VPA:

kubectl get vpa

Horizontal and Vertical Autoscaling in Kubernetes 14

Step #2:Check VPA Recommendations

kubectl get vpa nginx-vpa --output yaml

You will see recommended CPU/memory values.

Horizontal and Vertical Autoscaling in Kubernetes 15

Lower Bound → Minimum CPU/memory required
Upper Bound → Maximum it should have
Target → Ideal value

Step #3:Generate Load to Test Auto-Scaling

To see auto-scaling in action, generate CPU load:

kubectl run --rm -it --image=busybox load-generator -- /bin/sh

Inside the pod, run:

while true; do wget -q -O- http://nginx-deployment; done

After some time, check if pods restart with updated requests/limits:

Conclusion:

HPA and VPA help Kubernetes automatically adjust resources based on workload. HPA scales pods up or down, while VPA adjusts CPU and memory requests. Using them together ensures efficient resource management and better application performance. Try them out to keep your cluster running smoothly!

Related Articles:

Kubernetes Pod Troubleshooting Commands with Examples

Reference:

Autoscaling Workloads Kubernetes Official page