Horizontal and Vertical Autoscaling in Kubernetes

In this article we are going to cover Kubernetes Autoscaling: HPA and VPA | Horizontal and Vertical Autoscaling in Kubernetes with Examples.

Kubernetes autoscaling helps manage workloads efficiently by ensuring applications get the right amount of resources while minimizing costs. There are two main autoscaling mechanisms:

  • Horizontal Pod Autoscaler (HPA): Scales the number of pods based on CPU, memory, or custom metrics.
  • Vertical Pod Autoscaler (VPA): Adjusts CPU and memory resource requests for existing pods dynamically.

This guide provides a step-by-step approach to setting up and using HPA and VPA in Kubernetes.

Prerequisites

Before starting, ensure you have:

  • A running Kubernetes cluster (Minikube, EKS, GKE, AKS, etc.)
  • kubectl installed and configured
  • Basic knowledge of Kubernetes Deployments and Pods

Step #1:Install Horizontal Pod Autoscaler (HPA)

HPA requires the Metrics Server to collect CPU and memory usage data.

Install Metrics Server on Minikube
minikube addons enable metrics-server
Horizontal and Vertical Autoscaling in Kubernetes 1

Verify the installation:

kubectl get deployment metrics-server -n kube-system
Horizontal and Vertical Autoscaling in Kubernetes 2

Step #2:Install Vertical Pod Autoscaler (VPA)

VPA requires specific components to function properly.

1. Clone the VPA Repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
Horizontal and Vertical Autoscaling in Kubernetes 3
2. Deploy the VPA Components
./hack/vpa-up.sh
Horizontal and Vertical Autoscaling in Kubernetes 4
3. Verify the Installation
kubectl get pods -n kube-system | grep vpa
Horizontal and Vertical Autoscaling in Kubernetes 5

Horizontal Pod Autoscaler (HPA)

HPA automatically scales the number of replicas of a deployment based on CPU or memory usage.

Step #1:Deploy a Sample Application

Create a deployment with CPU requests/limits:

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "250m"
          limits:
            cpu: "500m"

Apply it:

kubectl apply -f deployment.yaml
Horizontal and Vertical Autoscaling in Kubernetes 6
Horizontal and Vertical Autoscaling in Kubernetes 7

Step #2:Expose the Deployment

Create a Service to access the Nginx pods.

kubectl expose deployment nginx-deployment --type=NodePort --port=80

Check the Service:

kubectl get svc nginx-deployment
Horizontal and Vertical Autoscaling in Kubernetes 8

Step #3:Create a Horizontal Pod Autoscaler (HPA)

Create a file hpa.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 10
behavior:
scaleDown:
stabilizationWindowSeconds: 30 # Default is 300s (5 min), lower it to 30s
policies:
- type: Percent
value: 50 # Reduce pods by 50% at a time
periodSeconds: 30 # Every 30 seconds

Apply the HPA:

kubectl apply -f hpa.yaml

Check the HPA:

kubectl get hpa
Horizontal and Vertical Autoscaling in Kubernetes 9

Step #4:Generate Load to Test Auto-Scaling

To see auto-scaling in action, generate CPU load:

kubectl run --rm -it --image=busybox load-generator -- /bin/sh
Horizontal and Vertical Autoscaling in Kubernetes 10

Inside the pod, run:

while true; do wget -q -O- http://nginx-deployment; done
Horizontal and Vertical Autoscaling in Kubernetes 11

Check if HPA scales the pods:

kubectl get hpa
kubectl get pods

When the cpu hits the target new pod got created as you can check it in the below image:

Horizontal and Vertical Autoscaling in Kubernetes 12

When the load on cpu gets lower the newly created pods will be deleted as shown in below image:

Horizontal and Vertical Autoscaling in Kubernetes 13

Vertical Pod Autoscaler (VPA)

VPA automatically adjusts CPU/memory requests of a running pod based on usage.

Step #1:Create a VPA Resource

Define a VPA for the nginx-deployment:

vpa.yaml:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: nginx-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: nginx-deployment
  updatePolicy:
    updateMode: "Auto"

Apply it:

kubectl apply -f vpa.yaml

Check the VPA:

kubectl get vpa
Horizontal and Vertical Autoscaling in Kubernetes 14

Step #2:Check VPA Recommendations

kubectl get vpa nginx-vpa --output yaml

You will see recommended CPU/memory values.

Horizontal and Vertical Autoscaling in Kubernetes 15
  • Lower Bound → Minimum CPU/memory required
  • Upper Bound → Maximum it should have
  • Target → Ideal value

Step #3:Generate Load to Test Auto-Scaling

To see auto-scaling in action, generate CPU load:

kubectl run --rm -it --image=busybox load-generator -- /bin/sh
Horizontal and Vertical Autoscaling in Kubernetes 10

Inside the pod, run:

while true; do wget -q -O- http://nginx-deployment; done
Horizontal and Vertical Autoscaling in Kubernetes 11

After some time, check if pods restart with updated requests/limits:

Conclusion:

HPA and VPA help Kubernetes automatically adjust resources based on workload. HPA scales pods up or down, while VPA adjusts CPU and memory requests. Using them together ensures efficient resource management and better application performance. Try them out to keep your cluster running smoothly!

Related Articles:

Kubernetes Pod Troubleshooting Commands with Examples

Reference:

Autoscaling Workloads Kubernetes Official page

Harish Reddy

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share via
Copy link
Powered by Social Snap