Monitoring Your Kubernetes Cluster with Prometheus and Grafana

Monitoring Your Kubernetes Cluster with Prometheus and Grafana

In today’s complex Kubernetes environments, having a robust monitoring solution is not just nice to have—it’s essential. This guide will walk you through setting up Prometheus and Grafana to monitor your K3s or any other Kubernetes cluster.

Why Prometheus and Grafana?

  • Prometheus: An open-source systems monitoring and alerting toolkit that collects and stores metrics as time series data
  • Grafana: A multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts when connected to supported data sources

Together, they form a powerful monitoring stack that provides insights into your cluster’s health and performance.

Prerequisites

Before we begin, ensure you have:

  • A running Kubernetes cluster (this guide uses K3s)
  • kubectl configured to communicate with your cluster
  • Helm 3 installed

Installation using Helm

The easiest way to install Prometheus and Grafana is using the kube-prometheus-stack Helm chart, which includes:

  • Prometheus Operator
  • Prometheus instance
  • Alertmanager
  • Grafana
  • Node Exporter
  • Kube State Metrics

Let’s create a namespace and install the stack:

# Create a dedicated namespace
kubectl create namespace monitoring

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword=your-strong-password

Replace your-strong-password with a secure password for the Grafana admin user.

Accessing the Dashboards

By default, the services are not exposed outside the cluster. To access them, you can use port-forwarding:

Grafana

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Then access Grafana at http://localhost:3000 with username admin and the password you specified during installation.

Prometheus

kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090

Access the Prometheus UI at http://localhost:9090.

Setting Up Ingress (Optional)

For production environments, you’ll want to set up proper ingress. Here’s an example using Nginx ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls

Apply this with kubectl apply -f ingress.yaml after replacing grafana.example.com with your domain.

Important Dashboards for Kubernetes

Grafana comes with several pre-installed dashboards, but here are some essential ones you should import:

  1. Kubernetes Cluster Overview (ID: 10856)
  2. Node Exporter Full (ID: 1860)
  3. Kubernetes Resource Requests (ID: 13770)

To import a dashboard:

  1. Go to Grafana UI
  2. Click on ”+” icon in the sidebar
  3. Select “Import”
  4. Enter the dashboard ID
  5. Click “Load”
  6. Select the Prometheus data source
  7. Click “Import”

Setting Up Alerts

Let’s set up a basic alert for node CPU usage:

  1. In Grafana, go to Alerting > Alert Rules
  2. Click “New Alert Rule”
  3. Configure the query: instance:node_cpu_utilisation:rate5m > 0.8
  4. Set the condition to: IS ABOVE 0.8
  5. Set evaluation interval: 1m
  6. Set “For”: 5m (alert will fire if condition is true for 5 minutes)
  7. Add labels and annotations as needed
  8. Save the rule

Best Practices

  1. Resource Limits: Set appropriate resource requests and limits for Prometheus and Grafana
  2. Retention Period: Configure the retention period based on your storage capacity
  3. Persistent Storage: Use persistent volumes for Prometheus data
  4. Federation: For large clusters, consider Prometheus federation
  5. Custom Metrics: Set up custom metrics for your applications using client libraries

Advanced Configuration

For a production environment, you’ll want to customize the Helm values. Create a values.yaml file:

prometheus:
  prometheusSpec:
    retention: 15d
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: 1000m
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

grafana:
  persistence:
    enabled: true
    size: 10Gi
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 200m

Then update your Helm release:

helm upgrade prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml

Troubleshooting

Common Issues

  1. Insufficient Resources: If pods are crashing, check if they have enough resources allocated
  2. Connectivity Issues: Ensure services can communicate with each other
  3. Data Retention: If Prometheus is losing data, check the storage configuration
  4. Target Scraping: If metrics aren’t appearing, check Prometheus targets status

Useful Commands

# Check pod status
kubectl get pods -n monitoring

# Check Prometheus targets
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
# Then visit http://localhost:9090/targets

# View Prometheus logs
kubectl logs -n monitoring deploy/prometheus-operator

# View Grafana logs
kubectl logs -n monitoring deploy/prometheus-grafana

Conclusion

You now have a robust monitoring solution for your Kubernetes cluster. With Prometheus collecting metrics and Grafana visualizing them, you’ll have deep insights into your cluster’s performance and health.

In future articles, we’ll explore more advanced topics like custom exporters, alert integrations, and high availability setups for your monitoring stack.