What is Kubernetes StatefulSet and why do we need it ?

In this tutorial I am going to cover yet another important topic of Kubernetes which is StatefulSet.

Table of Contents

Prerequisites

To perform the practical demonstration at your own I assume that you have a healthy 3-node Kubernetes cluster already been provisioned.

My environment

Node	IP	HostName	OS	Kubernetes version	Docker version
Master	172.32.32.100	kmaster-ft.example.com	Ubuntu 18.04	v1.19.3	19.03.6
Worker 1	172.32.32.101	kworker-ft1.example.com	Ubuntu 18.04	v1.19.3	19.03.6
Worker 2	172.32.32.102	kworker-ft2.example.com	Ubuntu 18.04	v1.19.3	19.03.6

Before we start the practical demonstration lets first understand what is Kubernetes StatefulSet.

What is Kubernetes StatefulSet?

As per Kubernetes official documentation Kubernetes StatefulSet is the workload API object used to manage stateful applications.

Not quite clear. Right!

Lets first understand what is stateful and stateless means in a bit more easier and detailed manner.

What is a stateful application?

A Stateful application is the one which saves all the data to a persistent disk storage which will be used by the application clients, other dependent applications or by the server itself. For example A database is a stateful application or or you can say any key-value store to which data is saved and retrieved by other applications. Some popular examples of stateful applications are MongoDB, Cassandra, and MySQL etc.

What is stateless application?

A stateless app is an application that does not save any client data on the server-side generated in one session for use in the next session with that client. Instead client is responsible for storing and handling all application state-related information on client side. It improves the performance of applications.

For example All the web services (for eg. Apache, Nginx, or Tomcat) which are reliant of RESTful API designs. They do not care which network they are using, and they don’t need a permanent storage as well.

Now since we have a good understanding of stateful and stateless applications we can easily go back to our original topic of understanding Kubernetes StatefulSet.

When to use Kubernetes StatefulSet ?

If you have a stateless app needs to be deployed over Kubernetes cluster go with a Deployment. As far as a Deployment is concerned, Pods are interchangeable. The client never cares about from which pod he is getting the request response for example an Nginx web server.

But if you want to deploy stateful applications such as databases you must go with Kubernetes StatefulSets. Because unlike a deployment, the StatefulSet provides a certain guarantee about the identity of the pods it is managing (that is, predictable names) and about the startup order.

Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling for example the hostnames.

If you want to use storage volumes to provide persistence for your workload, you can use a StatefulSet as part of the solution. Although individual Pods in a StatefulSet are susceptible to failure, the persistent Pod identifiers make it easier to match existing volumes to the new Pods that replace any that have failed.

What do you achieve from StatefulSet Deployments?

Stable, unique network identifiers: Each pod part of a StatefulSet will be given a hostname which will be based on the application name and increment it by one. For example, redis-cluster-0, redis-cluster-1, redis-cluster-2, redis-cluster-3 and so on for a StatefulSet named “redis-cluster” that has 4 instances running.
Stable, persistent storage: Every pod in the cluster will be assigned its own persistent volume which will be based on the storage class we defined. It will assigned to default, if no storage classes are defined. Deleting or scaling down pods will not automatically delete the volumes associated with them- so that the data persists. In order to delete the resources not needed, you could scale the StatefulSet down to 0 first, prior to deletion of the unused pods.
Ordered, graceful deployment and scaling: All the pods part of a StatefulSet are always created and will be brought online in an specific order, starting from 1 to n, and they will be shut down in reverse order to ensure a reliable and repeatable deployment and runtime. The StatefulSet will never scale until all the desired pods are running. In case one pod dies, it will recreate the pod before making an attempt to add additional instances to meet the scaling criteria.
Ordered, automated rolling updates: If you’ve chosen RollingUpdate , when you apply the manifest, StatefulSet pods will be removed and then be replaced in reverse ordinal order. They have the ability to handle upgrades in a rolling manner where it shuts down each node in the order it was created originally and builds them, continuing this until all the old versions instances have been shut down and cleaned up. Persistent volumes as we know will be reused, and data is automatically migrated to the upgraded version instances.

Practical Demonstration of Kubernetes Statefulset

To understand the StatefulSet in its entirety we will be taking an example of Deploying Redis Cluster on our Kubernetes cluster.

Step #1: Create Persistent Volume and Storage Class

A local persistent volume represents a local disk directly-attached to a single Kubernetes Node. A StorageClass provides a way for administrators to describe the “classes” of storage they offer.

Here is our Persistent Volume (PV) and Staorage class manifest file: We are creating 4 persistent volumes each for a single redis instance on the node kworker-ft1.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# cat local-pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: statefulset-ft-demo-pv-1
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/local-storage1
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kworker-ft1


---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: statefulset-ft-demo-pv-2
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/local-storage2
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kworker-ft1

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: statefulset-ft-demo-pv-3
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/local-storage3
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kworker-ft1

---

apiVersion: v1
kind: PersistentVolume
metadata:
  name: statefulset-ft-demo-pv-4
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /mnt/local-storage4
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - kworker-ft1

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

NOTE: Local volumes do not currently support dynamic provisioning, however a StorageClass should still be created to delay volume binding until Pod scheduling. This is specified by the WaitForFirstConsumer volume binding mode.

Create the PV and Storage Class:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl apply -f local-pv.yml
persistentvolume/statefulset-ft-demo-pv-1 created
persistentvolume/statefulset-ft-demo-pv-2 created
persistentvolume/statefulset-ft-demo-pv-3 created
persistentvolume/statefulset-ft-demo-pv-4 created
storageclass.storage.k8s.io/local-storage created

Verify the PV created:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get pv
NAME                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS    REASON   AGE
statefulset-ft-demo-pv-1   10Gi       RWO            Delete           Bound    default/data-redis-cluster-0   local-storage            3m49s
statefulset-ft-demo-pv-2   10Gi       RWO            Delete           Bound    default/data-redis-cluster-2   local-storage            3m49s
statefulset-ft-demo-pv-3   10Gi       RWO            Delete           Bound    default/data-redis-cluster-3   local-storage            3m49s
statefulset-ft-demo-pv-4   10Gi       RWO            Delete           Bound    default/data-redis-cluster-1   local-storage            3m49s

verify the Storage class created:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get storageclass local-storage
NAME            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-storage   kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  4m2s

Step #2: Create StatefulSet

Here is the StatefulSet manifest file: (We will go through each and every part of this file to make you understand what its expected to do)

root@kmaster-ft:~/statefulset/statefulset-ft-demo# cat statefulset.yaml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-cluster
data:
  update-node.sh: |
    #!/bin/sh
    REDIS_NODES="/data/nodes.conf"
    sed -i -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${POD_IP}/" ${REDIS_NODES}
    exec "$@"
  redis.conf: |+
    cluster-enabled yes
    cluster-require-full-coverage no
    cluster-node-timeout 15000
    cluster-config-file /data/nodes.conf
    cluster-migration-barrier 1
    appendonly yes
    protected-mode no
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  serviceName: redis-cluster
  replicas: 4
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      containers:
      - name: redis
        image: redis:5.0.1-alpine
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        command: ["/conf/update-node.sh", "redis-server", "/conf/redis.conf"]
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        volumeMounts:
        - name: conf
          mountPath: /conf
          readOnly: false
        - name: data
          mountPath: /data
          readOnly: false
      volumes:
      - name: conf
        configMap:
          name: redis-cluster
          defaultMode: 0755
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "local-storage"
      resources:
        requests:
          storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
  name: redis-cluster
spec:
  clusterIP: None
  ports:
  - port: 6379
    targetPort: 6379
    name: client
  - port: 16379
    targetPort: 16379
    name: gossip
  selector:
    app: redis-cluster

Kubernetes StatefulSet yaml structure is almost identical to a Deployment. The only difference is that we need serviceName. So we need to define a service that’s going expose out pods.

In the first part we have a configMap defined which will be consumed by redis instances
second part we have statefulSet defined with serviceName, Replicas and label selectors for the pods.
we are using redis:5.0.1-alpine image for our pod containers, defined the ports needs to be exposed and the entry commands for the containers to run and few environment variables.
Then we have our volumeMounts where we are going to make use of our configMap.
Then we have volumeClaimTemplates where we are going to make use of PV’s and storage class we have created as part of first step.
At last we have a headless Service created which is responsible for the network identity of the Pods.

NOTE: StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.

It’s is very important to understand this concept.

What is Headless Services ?

Sometimes you don’t need load-balancing and a single Service IP. In this case, you can create what are termed “headless” Services, by explicitly specifying "None" for the cluster IP (.spec.clusterIP).

For headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them. How DNS is automatically configured depends on whether the Service has selectors defined:

With selectors

For headless Services that define selectors, the endpoints controller creates Endpoints records in the API, and modifies the DNS configuration to return records (addresses) that point directly to the Pods backing the Service.

Without selectors

For headless Services that do not define selectors, the endpoints controller does not create Endpoints records. However, the DNS system looks for and configures either:

CNAME records for ExternalName-type Services.
A records for any Endpoints that share a name with the Service, for all other types

In our manifest file we have the selectors defined. (app: redis-cluster)

Let us deploy the above manifest file.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl apply -f statefulset.yaml
configmap/redis-cluster unchanged
statefulset.apps/redis-cluster created
service/redis-cluster created

verify the Statfulset:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get statefulset -o wide
NAME            READY   AGE     CONTAINERS   IMAGES
redis-cluster   4/4     3m41s   redis        redis:5.0.1-alpine

verify the pods running:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get pods -o wide
NAME              READY   STATUS    RESTARTS   AGE   IP               NODE          NOMINATED NODE   READINESS GATES
redis-cluster-0   1/1     Running   0          26s   192.168.77.181   kworker-ft1   <none>           <none>
redis-cluster-1   1/1     Running   0          23s   192.168.77.182   kworker-ft1   <none>           <none>
redis-cluster-2   1/1     Running   0          19s   192.168.77.183   kworker-ft1   <none>           <none>
redis-cluster-3   1/1     Running   0          16s   192.168.77.184   kworker-ft1   <none>           <none>

You should notice all the pods have been created on kworker-ft1 node as we are using PV deployed on that node for data storage.

Describe the service created:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl describe svc redis-cluster
Name:              redis-cluster
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=redis-cluster
Type:              ClusterIP
IP:                None
Port:              client  6379/TCP
TargetPort:        6379/TCP
Endpoints:         192.168.77.181:6379,192.168.77.182:6379,192.168.77.183:6379 + 1 more...
Port:              gossip  16379/TCP
TargetPort:        16379/TCP
Endpoints:         192.168.77.181:16379,192.168.77.182:16379,192.168.77.183:16379 + 1 more...
Session Affinity:  None
Events:            <none>

Step #3: start and verify the redis cluster deployment

To do this, we run the following commands and type yes to accept the configuration.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# IPs=$(kubectl get pods -l app=redis-cluster -o jsonpath='{range.items[*]}{.status.podIP}:6379 '{end})

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl exec -it redis-cluster-0  -- /bin/sh -c "redis-cli -h 127.0.0.1 -p 6379 --cluster create ${IPs}"
>>> Performing hash slots allocation on 4 nodes...
Master[0] -> Slots 0 - 4095
Master[1] -> Slots 4096 - 8191
Master[2] -> Slots 8192 - 12287
Master[3] -> Slots 12288 - 16383
M: c836f4cd6b9ef4aa4d1f56a706e5687aea3d89a9 192.168.77.181:6379
   slots:[0-4095] (4096 slots) master
M: 6877f0d17e24e04dda1ba5e25037568313d42a81 192.168.77.182:6379
   slots:[4096-8191] (4096 slots) master
M: 62dc5d4c89dc3ca9aa0c3258efe6e0442dce1d30 192.168.77.183:6379
   slots:[8192-12287] (4096 slots) master
M: d4f1e6345f6327d12e5e8688d1ed660052a155c6 192.168.77.184:6379
   slots:[12288-16383] (4096 slots) master
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
....
>>> Performing Cluster Check (using node 192.168.77.181:6379)
M: c836f4cd6b9ef4aa4d1f56a706e5687aea3d89a9 192.168.77.181:6379
   slots:[0-4095] (4096 slots) master
M: d4f1e6345f6327d12e5e8688d1ed660052a155c6 192.168.77.184:6379
   slots:[12288-16383] (4096 slots) master
M: 62dc5d4c89dc3ca9aa0c3258efe6e0442dce1d30 192.168.77.183:6379
   slots:[8192-12287] (4096 slots) master
M: 6877f0d17e24e04dda1ba5e25037568313d42a81 192.168.77.182:6379
   slots:[4096-8191] (4096 slots) master
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Step #4: Test the Redis Cluster by deploying the Hit Counter App

We’ll deploy a simple hit counter app into our cluster and put a load balancer in front of it. The purpose of this app is to increment a counter and store the value in the Redis cluster before returning the counter value as an HTTP response over UI.

And since we have Persistent Storage configured even if we delete the pods the data will not be lost.

Here is the application manifest file:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# cat example-app.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: hit-counter-lb
spec:
  type: NodePort
  ports:
  - port: 80
    protocol: TCP
    targetPort: 5000
    nodePort: 32200
  selector:
      app: myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hit-counter-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: rakeshrhcss/hit-counter-app-redis:1.0
        ports:
        - containerPort: 5000

Create the deployment and associated service.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl apply -f example-app.yaml
service/hit-counter-lb created
deployment.apps/hit-counter-app created

verify the pods running:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
hit-counter-app-548b8c58f-s8c6f   1/1     Running   0          12s     192.168.17.30    kworker-ft2   <none>           <none>
redis-cluster-0                   1/1     Running   0          6m3s    192.168.77.181   kworker-ft1   <none>           <none>
redis-cluster-1                   1/1     Running   0          6m      192.168.77.182   kworker-ft1   <none>           <none>
redis-cluster-2                   1/1     Running   0          5m56s   192.168.77.183   kworker-ft1   <none>           <none>
redis-cluster-3                   1/1     Running   0          5m53s   192.168.77.184   kworker-ft1   <none>           <none>

verify the deployment:

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get deploy hit-counter-app -o wide
NAME              READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                  SELECTOR
hit-counter-app   1/1     1            1           52s   myapp        rakeshrhcss/hit-counter-app-redis:1.0   app=myapp

verify the service exposed (NodePort):

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl describe svc hit-counter-lb
Name:                     hit-counter-lb
Namespace:                default
Labels:                   <none>
Annotations:              <none>
Selector:                 app=myapp
Type:                     NodePort
IP:                       10.111.80.255
Port:                     <unset>  80/TCP
TargetPort:               5000/TCP
NodePort:                 <unset>  32200/TCP
Endpoints:                192.168.17.30:5000
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

Step #5: Verify the application deployment and StatefulSet running

Access the deployment from web browser outside of your network with NodePort 32200.

What is Kubernetes StatefulSet and why do we need it ? 2

Now the number of times you are going to hit the url the number will get incremented by one. try this once from cli.

Now I will hit the url few times and increase the counter to 10 and then I will delete few pods.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl delete pods redis-cluster-1 redis-cluster-2
pod "redis-cluster-1" deleted
pod "redis-cluster-2" deleted

Lets verify the pods status now.

root@kmaster-ft:~/statefulset/statefulset-ft-demo# kubectl get pods -o wide
NAME                              READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
hit-counter-app-548b8c58f-s8c6f   1/1     Running   0          7m47s   192.168.17.30    kworker-ft2   <none>           <none>
redis-cluster-0                   1/1     Running   0          13m     192.168.77.181   kworker-ft1   <none>           <none>
redis-cluster-1                   1/1     Running   0          14s     192.168.77.185   kworker-ft1   <none>           <none>
redis-cluster-2                   1/1     Running   0          12s     192.168.77.186   kworker-ft1   <none>           <none>
redis-cluster-3                   1/1     Running   0          13m     192.168.77.184   kworker-ft1   <none>           <none>

new pods with the same DNS name have been created.

Lets verify if our data is also persistent. If yes then we should get the hit counter starting from 11.

Yes Its working as expected. The data is persistent and the same pods are serving the requests.

This was all about statefulSets. It’s bit lengthy article but would help you a lot in understanding the core concepts of a Kubernetes StatefulSets.

Hope you like the article. Please let me know your feedback in the response section.

Thanks. Happy learning!

Related Articles

Kubernetes Tutorial for Beginners [10 Practical Articles]

Reference:

Kubernetes official guide