Kubernetes cluster Monitoring with Prometheus and Grafana

Q: What is Alert Manager ?

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or MS teams etc. It also takes care of silencing and inhibition of alerts.

In this article, We are going to demonstrate Kubernetes cluster Monitoring with Prometheus and Grafana.

Kubernetes cluster Monitoring with Prometheus and Grafana 1

Kubernetes becomes a complex environment with so many moving resources, monitoring even a small Kubernetes cluster is challenging. Monitoring Kubernetes cluster requires an in depth understanding of the application architecture and functionality in order to design and manage an effective solution.

In this tutorial I will demonstrate the entire process of implementing monitoring on Kubernetes cluster with the tools like Prometheus, alert manager and Grafana.

Before jumping onto the practical things let us first understand about all these tools in brief.

What is Prometheus ?

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting. It can absorb huge amount of data every second.

What is Alert Manager ?

The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or MS teams etc. It also takes care of silencing and inhibition of alerts.

What is Grafana ?

Grafana is open source visualization and analytics software. It allows you to query, visualize, alert on, and explore your metrics no matter where they are stored. It provides charts, graphs, and alerts for the web when connected to supported data sources such as Prometheus, graphite etc.

NOTE: This Article assumes that you have a Kubernetes cluster up and running and kubectl CLI on your local system.

Our Kubernetes Cluster Environment:

We have one master (control plane node) and two worker nodes.

Node	IP	HostName	Specs	Kubectl Version	Docker Version
Master	172.42.42.200	kmaster-ft.example.com	4 GB RAM, 2 vCPU’s, 64 GB HDD	v1.18.8	19.03.6
Worker 1	172.42.42.201	kworker-ft1.example.com	2 GB RAM, 2 vCPU’s, 64 GB HDD	v1.18.8	19.03.6
Worker 2	172.42.42.202	kworker-ft2.example.com	2 GB RAM, 2 vCPU’s, 64 GB HDD	v1.18.8	19.03.6

Methods to deploy monitoring environment on Kubernetes:

There are two methods we can deploy monitoring environment on Kubernetes cluster.

The first method is by using the individual yaml configuration files for each resource such as Deployment, stateful sets, services, service accounts, cluster roles etc.

The second and the recommended method is to use Helm package manager. Helm is a package manager for Kubernetes equivalent of yum or apt for Red hat and Ubuntu. Helm deploys charts, which you can think of as a packaged application. It is a collection of all your versioned, pre-configured application resources which can be deployed as one unit.

Helm Installation on Linux

root@kmaster-ft:~/monitoring# wget https://get.helm.sh/helm-v3.3.4-linux-amd64.tar.gz

root@kmaster-ft:/opt# tar -xvzf helm-v3.3.4-linux-amd64.tar.gz

root@kmaster-ft:/opt# mv linux-amd64/helm /usr/local/bin/

verify the helm installation and version:

root@kmaster-ft:/opt# helm version
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}

Kubernetes cluster Monitoring with Prometheus and Grafana

In Prometheus the time series collection happens via a pull model over HTTP. It sends http requests to target (scrapes) and the response (metrics data) it gets in response gets stored in storage (The time series database “TSDB”).

Configure Persistent storage

A Persistent storage is needed for Prometheus. (It is any storage device that retains data after power to the device is turned off). Persistent storage can help protect critical data from eviction, and reduce the chance of data loss.

In our tutorial we will install NFS Server on Master Node and configure [/home/nfsshare] directory as NFS share as external persistent storage, and also configure dynamic volume provisioning with NFS plugin.

Kubernetes : Dynamic Volume Provisioning (NFS)

To use Dynamic Volume Provisioning feature when using Persistent Storage, it’s possible to create PV (Persistent Volume) dynamically without creating PV manually by Cluster Administrator when created PVC (Persistent Volume Claim) by users.

First configure NFS Server on Master Node and then configure dynamic volume provisioning with NFS provisioner.

NFS : Configure NFS Server on Kubernetes Master

Configure NFS Server to share directories on your Network.

root@kmaster-ft:~/monitoring# apt -y install nfs-kernel-server

Change the domain name in idmapd file:

root@kmaster-ft:~/monitoring#  vim /etc/idmapd.conf

# line 6: uncomment and change to your domain name
Domain = example.com

Create an NFS Export Directory:

root@kmaster-ft:~/monitoring# mkdir /home/nfsshare

Change the ownership and permissions of the NFS export directory:

root@kmaster-ft:~/monitoring# chown -R nobody:nogroup /home/nfsshare

root@kmaster-ft:~/monitoring# chmod 777 /home/nfsshare

Restart the NFS service:

root@kmaster-ft:~/monitoring# systemctl restart nfs-server

NFS : Configure NFS Client

Configure NFS Client to mount NFS Share on NFS Client:

root@kworker-ft1:~#  apt -y install nfs-common

Update idmapd file:

root@kworker-ft1:~#  vim /etc/idmapd.conf

# line 6: uncomment and change to your domain name
Domain = example.com

To mount dynamically when anyone access to NFS Share, Configure AutoFS.

root@kworker-ft1:~# apt -y install autofs

Update /etc/auto.master file:

root@kworker-ft1:~# vim /etc/auto.master
# add to the end
/-    /etc/auto.mount

Update auto.mount file:

root@kworker-ft1:~# cat /etc/auto.mount
/mnt   -fstype=nfs,rw  172.42.42.200:/home/nfsshare

Restart the service:

root@kworker-ft1:~# systemctl restart autofs

Now we have our NFS server setup ready. Lets start configuring NFS Client Provisioner on master node:

root@kmaster-ft:~/monitoring# helm install nfs-client -n kube-system --set nfs.server=172.42.42.200 --set nfs.path=/home/nfsshare stable/nfs-ient-provisioner
NAME: nfs-client
LAST DEPLOYED: Fri Oct 23 09:03:06 2020
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

Verify the pods running:

root@kmaster-ft:~/monitoring# kubectl get pods -n kube-system -w
NAME                                                 READY   STATUS    RESTARTS   AGE
calico-node-2vs9l                                    1/1     Running   0          31m
calico-node-c6l89                                    1/1     Running   0          31m
calico-node-z2v8q                                    1/1     Running   0          31m
etcd-kmaster-ft                                      1/1     Running   21         30m
kube-apiserver-kmaster-ft                            1/1     Running   11         30m
kube-controller-manager-kmaster-ft                   1/1     Running   15         30m
kube-proxy-g42ls                                     1/1     Running   0          30m
kube-proxy-hq5dk                                     1/1     Running   0          30m
kube-proxy-v7gk6                                     1/1     Running   0          30m
kube-scheduler-kmaster-ft                            1/1     Running   15         30m
nfs-client-nfs-client-provisioner-77699ff74f-r2fg9   1/1     Running   0          4m54s

Till now we have made our Kubernetes server to have NFS configured as persistent storage for our monitoring resources.

Now lets start deploying the Prometheus.

Kubernetes : Deploy Prometheus

Like the jobs it monitors, Prometheus runs as a pod in your Kubernetes cluster.

root@kmaster-ft:~/monitoring# helm inspect values stable/prometheus > prometheus.yaml

Change the storageClass Name in the output file from previous command.

root@kmaster-ft:~/monitoring# vim prometheus.yaml
alertmanager:
.....
.....
    line 213: uncomment and specify [storageClass] to use
    storageClass: "nfs-client"
.....
.....

server:
.....
.....
    line 803: uncomment and specify [storageClass] to use
    storageClass: "nfs-client"
.....
.....
pushgateway:
.....
.....
    line 1134: uncomment and specify [storageClass] to use
    storageClass: "nfs-client"

Create a separate namespace to deploy our monitoring resources on:

root@kmaster-ft:~/monitoring# kubectl create namespace monitoring
namespace/monitoring created

Deploy Prometheus helm chart:

root@kmaster-ft:~/monitoring# helm install prometheus --namespace monitoring -f prometheus.yaml stable/prometheus
WARNING: This chart is deprecated
NAME: prometheus
LAST DEPLOYED: Fri Oct 23 08:44:49 2020
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None

Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace monitoring port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
prometheus-alertmanager.monitoring.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace monitoring port-forward $POD_NAME 9093
#################################################################################
######   WARNING: Pod Security Policy has been moved to a global property.  #####
######            use .Values.podSecurityPolicy.enabled with pod-based      #####
######            annotations                                               #####
######            (e.g. .Values.nodeExporter.podSecurityPolicy.annotations) #####
#################################################################################


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
prometheus-pushgateway.monitoring.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace monitoring port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

Verify the pods running:

root@kmaster-ft:~/monitoring# kubectl get pods -n monitoring -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
create-vs-apply-demo                            1/1     Running   0          3h59m   172.16.213.233   kworker-ft2   <none>           <none>
create-vs-apply-demo-2                          1/1     Running   0          3h46m   172.16.213.37    kworker-ft1   <none>           <none>
prometheus-alertmanager-6b64586d49-t7w5v        2/2     Running   0          4h38m   172.16.213.28    kworker-ft1   <none>           <none>
prometheus-kube-state-metrics-c65b87574-m8sh8   1/1     Running   0          4h38m   172.16.213.247   kworker-ft2   <none>           <none>
prometheus-node-exporter-8lgx9                  1/1     Running   0          4h38m   172.42.42.202    kworker-ft2   <none>           <none>
prometheus-node-exporter-w8b4f                  1/1     Running   0          4h38m   172.42.42.201    kworker-ft1   <none>           <none>
prometheus-pushgateway-7d5f5746c7-pc2nq         1/1     Running   0          4h38m   172.16.213.29    kworker-ft1   <none>           <none>
prometheus-server-f8d46859b-nxbmt               2/2     Running   0          4h38m   172.16.213.252   kworker-ft2   <none>           <none>

Deploy Grafana:

root@kmaster-ft:~/monitoring# helm inspect values stable/grafana > grafana.yaml

Change the storageClassName in the output file from previous command.

root@kmaster-ft:~/monitoring# vim grafana.yaml
line 215: enable [persistence]
line 216: uncomment and change to your [storageClass]
persistence:
  type: pvc
  enabled: true
  storageClassName: nfs-client

Deploy the Grafana helm chart now:

root@kmaster-ft:~/monitoring# helm install grafana --namespace monitoring -f grafana.yaml stable/grafana
WARNING: This chart is deprecated
NAME: grafana
LAST DEPLOYED: Fri Oct 23 09:25:22 2020
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1

1. Get your 'admin' user password by running:

   kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana.monitoring.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:

     export POD_NAME=$(kubectl get pods --namespace monitoring -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace monitoring port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin

Verify the pods running:

root@kmaster-ft:~/monitoring# kubectl get pods -n monitoring -o wide
NAME                                            READY   STATUS    RESTARTS   AGE     IP               NODE          NOMINATED NODE   READINESS GATES
create-vs-apply-demo                            1/1     Running   0          3h59m   172.16.213.233   kworker-ft2   <none>           <none>
create-vs-apply-demo-2                          1/1     Running   0          3h46m   172.16.213.37    kworker-ft1   <none>           <none>
grafana-56b4b9fffc-8tbm2                        1/1     Running   0          3h57m   172.16.213.33    kworker-ft1   <none>           <none>
prometheus-alertmanager-6b64586d49-t7w5v        2/2     Running   0          4h38m   172.16.213.28    kworker-ft1   <none>           <none>
prometheus-kube-state-metrics-c65b87574-m8sh8   1/1     Running   0          4h38m   172.16.213.247   kworker-ft2   <none>           <none>
prometheus-node-exporter-8lgx9                  1/1     Running   0          4h38m   172.42.42.202    kworker-ft2   <none>           <none>
prometheus-node-exporter-w8b4f                  1/1     Running   0          4h38m   172.42.42.201    kworker-ft1   <none>           <none>
prometheus-pushgateway-7d5f5746c7-pc2nq         1/1     Running   0          4h38m   172.16.213.29    kworker-ft1   <none>           <none>
prometheus-server-f8d46859b-nxbmt               2/2     Running   0          4h38m   172.16.213.252   kworker-ft2   <none>           <none>

Verify the services running:

root@kmaster-ft:~/monitoring# kubectl get services --namespace monitoring
NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
grafana                         ClusterIP    10.108.66.249    <none>        80:/TCP          4h8m
prometheus-alertmanager         ClusterIP    10.111.89.244    <none>        80:/TCP          4h48m
prometheus-kube-state-metrics   ClusterIP    10.99.192.201    <none>        8080/TCP         4h48m
prometheus-node-exporter        ClusterIP    None             <none>        9100/TCP         4h48m
prometheus-pushgateway          ClusterIP    10.107.12.231    <none>        9091/TCP         4h48m
prometheus-server               ClusterIP    10.107.106.125   <none>        80/TCP           4h48m
prometheus-service

Till now we are done with our deployments of Prometheus, Alert Manager and Grafana.

Now let us start accessing these resources. We can access them on the local nodes and if you want your Monitoring server and Grafana server to be accessible outside of your cluster as well then you need to expose the respective services.

If you want to access Prometheus Web UI from a host within the cluster, paste the URL below on your web browser. http://prometheus-server.monitoring.svc.cluster.local

Set port forwarding to access Prometheus, alertmanager and Grafana from a client in your local network:

If you set port-fowarding, access the URL below on a client computer in your local network.

http://(Master Node Hostname or IP address):(port)/

root@kmaster-ft:~/monitoring# kubectl port-forward -n monitoring service/prometheus-server --address 0.0.0.0 9090:80

root@kmaster-ft:~/monitoring# kubectl port-forward -n monitoring service/grafana --address 0.0.0.0 3000:80

root@kmaster-ft:~/monitoring# kubectl port-forward -n monitoring service/prometheus-alertmanager --address 0.0.0.0 80:80

To access it from outside of your network which actually happens in real world we need to convert the ClusterIP service type to NodePort.

After that your monitoring resources will be accessible from any outside system by using the IP of any of your cluster nodes and the NodePort assigned.

Note:
A node port exposes the service on a static port on the node IP address. NodePorts are in the 30000-32767 range by default, which means a NodePort is unlikely to match a service’s intended port (for example, 8080 may be exposed as 31020).

Create services for Prometheus , Alert Manager and Grafana

Create following service yaml files.

Prometheus service –

root@kmaster-ft:~/monitoring# cat service.yml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9090'
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - port: 8080
      targetPort: 9090
      nodePort: 30000

Grafana Service –

root@kmaster-ft:~/monitoring# cat service-grafana.yml
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '3000'
spec:
  selector:
    name: grafana
  type: NodePort
  ports:
    - port: 80
      targetPort: 3000
      nodePort: 30002

Alertmanager Service –

root@kmaster-ft:~/monitoring# cat service-alertmanager.yml
apiVersion: v1
kind: Service
metadata:
  name: prometheus-alertmanager
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '3000'
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - port: 80
      targetPort: 9093
      nodePort: 30001

Now we have yaml config files created for all the three components. Apply them to create the respective services.

root@kmaster-ft:~/monitoring# kubectl apply -f service.yml

root@kmaster-ft:~/monitoring# kubectl apply -f service-grafana.yml

root@kmaster-ft:~/monitoring# kubectl apply -f service-alertmanager.yml

list out the running services:

root@kmaster-ft:~/monitoring# kubectl get services --namespace monitoring
NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
grafana                         NodePort    10.108.66.249    <none>        80:30002/TCP     4h21m
prometheus-alertmanager         NodePort    10.111.89.244    <none>        80:30001/TCP     5h1m
prometheus-kube-state-metrics   ClusterIP   10.99.192.201    <none>        8080/TCP         5h1m
prometheus-node-exporter        ClusterIP   None             <none>        9100/TCP         5h1m
prometheus-pushgateway          ClusterIP   10.107.12.231    <none>        9091/TCP         5h1m
prometheus-server               ClusterIP   10.107.106.125   <none>        80/TCP           5h1m
prometheus-service              NodePort    10.102.141.191   <none>        8080:30000/TCP   4h30m

Access the UI from browser:

Prometheus Server –

To access Prometheus UI in your browser at port 30000 –

http://172.42.42.200:30000/

Kubernetes cluster Monitoring with Prometheus and Grafana 2

Sample query graph on Prometheus UI :

As soon as it starts, the Prometheus pod will be accumulating data. After a few minutes, you’ll have some records that you can query using Prometheus’s powerful query language.

Kubernetes cluster Monitoring with Prometheus and Grafana 3

AlertManager UI –

To access AlertManager UI in your browser at port 30001

http://172.42.42.200:30001/#/alerts

Kubernetes cluster Monitoring with Prometheus and Grafana 4

Grafana UI –

To access Grafana UI in your browser at port 30002

http://172.42.42.200:30002/

Kubernetes cluster Monitoring with Prometheus and Grafana 5

Adding data sources to Grafana:

Add a data source. Move your cursor to the cog on the side menu which will show you the configuration menu. Click Add data source and you will come to the settings page of your new data source. In the Name box, enter a name for this data source. In the Type, select the type of data source. Click Save & Test.

Kubernetes cluster Monitoring with Prometheus and Grafana 6

Adding Dashboard templates :

To import a dashboard click the + icon in the side menu, and then click Import. From here you can upload a dashboard JSON file, paste a Grafana.com dashboard URL or paste dashboard JSON text directly into the text area.

Kubernetes cluster Monitoring with Prometheus and Grafana 7

Here are few Grafana dashboard templates which can be used for kubernetes cluster and resource visualization

ID – 3662 JSON – https://grafana.com/api/dashboards/3662/revisions/2/download

ID – 747 https://grafana.com/grafana/dashboards/747 https://grafana.com/api/dashboards/747/revisions/2/download

You can find more dashboards here –

https://grafana.com/grafana/dashboards?dataSource=prometheus&search=kubernetes&orderBy=downloads&direction=desc

Conclusion:

Now that you have successfully installed Prometheus Monitoring on a Kubernetes cluster, you can track the overall health, performance, and behavior of your system. How Prometheus, alertmanager and Grafana works is out of scope for this tutorial. Will be covered in another tutorial.

I hope you like the demonstration.

Please let us know your feedback below in the comment section.

Related Articles:

Kubernetes Tutorial for Beginners [10 Practical Articles]

Install prometheus and grafana on Kubernetes using helm 3

2 thoughts on “Kubernetes cluster Monitoring with Prometheus and Grafana”

paulfantom

October 27, 2020 at 2:34 PM

Very nice guide, one caveat though: Using NFS as backing store for prometheus is not supported. It can lead to data corruption. More in https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
- FOSS TechNix
  
  October 27, 2020 at 2:56 PM
  
  You are right here. Prometheus includes a local on-disk time series database, but can be integrated with remote storage systems such as NFS or any third party storage like NetApp. To maintain data across deployments and version upgrades, the data must be persisted to some volume. The article is just to show how to manually control the storage provisioning.

What is Prometheus ?

What is Alert Manager ?

What is Grafana ?

Our Kubernetes Cluster Environment:

Methods to deploy monitoring environment on Kubernetes:

Helm Installation on Linux

Kubernetes cluster Monitoring with Prometheus and Grafana

Configure Persistent storage

Kubernetes : Dynamic Volume Provisioning (NFS)

NFS : Configure NFS Server on Kubernetes Master

NFS : Configure NFS Client

Kubernetes : Deploy Prometheus

Deploy Prometheus helm chart:

Deploy Grafana:

Access the UI from browser:

Sample query graph on Prometheus UI :

AlertManager UI –

Grafana UI –

Adding data sources to Grafana:

Adding Dashboard templates :

FOSS TechNix

2 thoughts on “Kubernetes cluster Monitoring with Prometheus and Grafana”

Leave a Comment Cancel reply