Skip to main content

Kubernetes YAML Manifests

A collection of production-ready Kubernetes manifest examples for various workload types and configuration patterns.

Deployment

Standard Deployment with resource limits, probes, and environment variables

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
  namespace: production
  labels:
    app: web-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-application
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: web-application
    spec:
      containers:
      - name: web-application
        image: nginx:1.25-alpine
        ports:
        - containerPort: 80
          name: http
        resources:
          limits:
            cpu: "500m"
            memory: "512Mi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
deploymentproductionnginxprobes

Service (ClusterIP)

Internal service for pod-to-pod communication within the cluster

service-clusterip.yaml
apiVersion: v1
kind: Service
metadata:
  name: web-application
  namespace: production
  labels:
    app: web-application
spec:
  type: ClusterIP
  selector:
    app: web-application
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
  - name: https
    port: 443
    targetPort: 443
    protocol: TCP
serviceclusteripnetworking

Ingress (Traefik)

Ingress resource for Traefik with TLS and middleware

ingress-traefik.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-application
  namespace: production
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
    traefik.ingress.kubernetes.io/router.middlewares: production-redirect@kubernetescrd
spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-application
            port:
              number: 80
ingresstraefiktlsrouting

ConfigMap

Configuration data for application settings and environment variables

configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  # Simple key-value pairs
  DATABASE_HOST: "postgres.database.svc.cluster.local"
  DATABASE_PORT: "5432"
  LOG_LEVEL: "info"

  # File-based configuration
  nginx.conf: |
    server {
        listen 80;
        server_name _;

        location / {
            root /usr/share/nginx/html;
            index index.html;
            try_files $uri $uri/ /index.html;
        }

        location /api {
            proxy_pass http://backend:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
configmapconfigurationnginx

Secret

Secure storage for sensitive data like passwords and API keys

secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
type: Opaque
stringData:
  # Use stringData for plain text (auto-encoded to base64)
  DATABASE_USER: "app_user"
  DATABASE_PASSWORD: "secure-password-here"
  API_KEY: "your-api-key"

---
# For TLS certificates
apiVersion: v1
kind: Secret
metadata:
  name: app-tls-secret
  namespace: production
type: kubernetes.io/tls
data:
  # Base64-encoded certificate and key
  tls.crt: LS0tLS1CRUdJTi...
  tls.key: LS0tLS1CRUdJTi...
secretsecuritytlscredentials

PersistentVolumeClaim

Storage request for persistent data with Longhorn storage class

pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

---
# For shared storage (ReadWriteMany)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-data
  namespace: production
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: longhorn-nfs
  resources:
    requests:
      storage: 50Gi
pvcstoragelonghornpersistent

NetworkPolicy

Network segmentation to control pod-to-pod traffic

networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: app-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-application
  policyTypes:
  - Ingress
  - Egress
  ingress:
  # Allow traffic from ingress controller
  - from:
    - namespaceSelector:
        matchLabels:
          name: traefik
    ports:
    - protocol: TCP
      port: 80
  # Allow traffic from same namespace
  - from:
    - podSelector: {}
  egress:
  # Allow DNS resolution
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: UDP
      port: 53
  # Allow database access
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
    ports:
    - protocol: TCP
      port: 5432
networkpolicysecuritynetworkingisolation

HorizontalPodAutoscaler

Auto-scaling based on CPU and memory utilization

hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-application-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-application
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
hpaautoscalingperformance

StatefulSet

Stateful workload with stable network identities, ordered deployment, and persistent storage via volumeClaimTemplates

statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
  namespace: production
  labels:
    app: postgres
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
  - name: tcp-postgres
    port: 5432
    targetPort: 5432

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: production
  labels:
    app: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: postgres
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 999
        fsGroup: 999
      containers:
      - name: postgres
        image: postgres:16-alpine
        ports:
        - containerPort: 5432
          name: tcp-postgres
        env:
        - name: POSTGRES_DB
          value: "appdb"
        - name: PGDATA
          value: "/var/lib/postgresql/data/pgdata"
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-credentials
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-credentials
              key: password
        resources:
          requests:
            cpu: "250m"
            memory: "512Mi"
          limits:
            cpu: "1"
            memory: "1Gi"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - $$(POSTGRES_USER)
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - $$(POSTGRES_USER)
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes:
      - ReadWriteOnce
      storageClassName: longhorn
      resources:
        requests:
          storage: 20Gi
statefulsetpostgresdatabasepersistent-storage

DaemonSet

Node-level agent deployed to every node in the cluster, with tolerations for control-plane scheduling

daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  namespace: monitoring
  labels:
    app: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      serviceAccountName: log-collector
      containers:
      - name: log-collector
        image: fluent/fluent-bit:3.0
        resources:
          requests:
            cpu: "50m"
            memory: "64Mi"
          limits:
            cpu: "200m"
            memory: "256Mi"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config
          mountPath: /fluent-bit/etc/
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers
      - name: config
        configMap:
          name: log-collector-config
daemonsetloggingmonitoringnode-agent

Job and CronJob

One-time batch Job for database migration and a scheduled CronJob for nightly backups

job-cronjob.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: production
  labels:
    app: db-migration
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 600
  ttlSecondsAfterFinished: 86400
  template:
    metadata:
      labels:
        app: db-migration
    spec:
      restartPolicy: OnFailure
      containers:
      - name: migrate
        image: app-migrations:1.2.0
        command: ["./migrate", "--direction=up"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: postgres-credentials
              key: connection-string
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "256Mi"

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
  namespace: production
  labels:
    app: nightly-backup
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        metadata:
          labels:
            app: nightly-backup
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:16-alpine
            command:
            - /bin/sh
            - -c
            - |
              pg_dump -h postgres-headless \
                -U "$PGUSER" -d appdb \
                -F c -f /backups/backup-$(date +%Y%m%d).dump
            env:
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: username
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: password
            resources:
              requests:
                cpu: "100m"
                memory: "256Mi"
              limits:
                cpu: "500m"
                memory: "512Mi"
            volumeMounts:
            - name: backup-storage
              mountPath: /backups
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-data
jobcronjobbatchbackupmigration

RBAC

Least-privilege access control with ServiceAccount, Role, and RoleBinding scoped to a single namespace

rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: production
  labels:
    app: web-application
automountServiceAccountToken: false

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: app-role
  namespace: production
  labels:
    app: web-application
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["app-secrets"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list"]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["get", "create", "update"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-role-binding
  namespace: production
  labels:
    app: web-application
subjects:
- kind: ServiceAccount
  name: app-service-account
  namespace: production
roleRef:
  kind: Role
  name: app-role
  apiGroup: rbac.authorization.k8s.io
rbacsecurityserviceaccountleast-privilege

PodDisruptionBudget

Maintain minimum availability during voluntary disruptions like node drains and cluster upgrades

pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-application-pdb
  namespace: production
  labels:
    app: web-application
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-application
  unhealthyPodEvictionPolicy: IfHealthy

---
# Alternative: percentage-based PDB for larger deployments
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: production
  labels:
    app: postgres
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: postgres
pdbavailabilitydisruptionupgrades

ResourceQuota + LimitRange

Namespace-level guardrails for CPU, memory, and object counts with default container limits

quota-limitrange.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "8"
    requests.memory: "16Gi"
    limits.cpu: "16"
    limits.memory: "32Gi"
    pods: "40"
    persistentvolumeclaims: "20"
    services: "15"
    secrets: "30"
    configmaps: "30"

---
apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
  - type: PersistentVolumeClaim
    max:
      storage: "100Gi"
    min:
      storage: "1Gi"
resourcequotalimitrangenamespacegovernance

ServiceMonitor

Prometheus Operator CRD for automatic metrics scrape target discovery based on label selectors

servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-application
  namespace: production
  labels:
    app: web-application
    release: prometheus
spec:
  selector:
    matchLabels:
      app: web-application
  namespaceSelector:
    matchNames:
    - production
  endpoints:
  - port: http
    path: /metrics
    interval: 30s
    scrapeTimeout: 10s
    honorLabels: true
  - port: http
    path: /metrics/detailed
    interval: 60s
    scrapeTimeout: 15s

---
# PrometheusRule for alerting on the same application
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: web-application-alerts
  namespace: production
  labels:
    app: web-application
    release: prometheus
spec:
  groups:
  - name: web-application.rules
    rules:
    - alert: HighErrorRate
      expr: |
        sum(rate(http_requests_total{
          job="web-application",
          status=~"5.."
        }[5m])) /
        sum(rate(http_requests_total{
          job="web-application"
        }[5m])) > 0.05
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High 5xx error rate on web-application"
        description: "Error rate is above 5% for 5 minutes"
    - alert: PodRestartLooping
      expr: |
        increase(kube_pod_container_status_restarts_total{
          namespace="production",
          pod=~"web-application.*"
        }[1h]) > 3
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "Pod restart loop detected"
        description: "Pod {{ $labels.pod }} restarted more than 3 times in the last hour"
prometheusmonitoringservicemonitoralerting

TopologySpreadConstraints

Multi-zone pod spreading with topology constraints and anti-affinity to guarantee even distribution across failure domains

topology-spread.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api-server
spec:
  replicas: 6
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      topologySpreadConstraints:
      # Spread evenly across availability zones
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api-server
      # Spread evenly across individual nodes within each zone
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: api-server
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - api-server
            topologyKey: kubernetes.io/hostname
      containers:
      - name: api-server
        image: api-server:2.4.0
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1"
            memory: "512Mi"
        readinessProbe:
          httpGet:
            path: /healthz
            port: http
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /healthz
            port: http
          initialDelaySeconds: 15
          periodSeconds: 20
schedulinghazones

Init Containers

Pod initialization pattern with sequential init containers for dependency checks, config fetching, and filesystem preparation before the main application starts

init-containers.yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-with-init
  namespace: production
  labels:
    app: web-application
spec:
  initContainers:
  # 1. Wait for the database to become resolvable via DNS
  - name: wait-for-db
    image: busybox:1.36
    command:
    - /bin/sh
    - -c
    - |
      echo "Waiting for postgres to be ready..."
      until nslookup postgres.database.svc.cluster.local; do
        echo "postgres not ready - sleeping 2s"
        sleep 2
      done
      echo "postgres is reachable"
    resources:
      requests:
        cpu: "10m"
        memory: "16Mi"
      limits:
        cpu: "50m"
        memory: "32Mi"

  # 2. Download application config from S3
  - name: fetch-config
    image: amazon/aws-cli:2.15
    command:
    - /bin/sh
    - -c
    - |
      aws s3 cp s3://config-bucket/production/app.conf /config/app.conf
      aws s3 cp s3://config-bucket/production/features.json /config/features.json
      echo "Config files downloaded"
    env:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: s3-credentials
          key: access-key
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: s3-credentials
          key: secret-key
    - name: AWS_DEFAULT_REGION
      value: "us-east-1"
    volumeMounts:
    - name: config-volume
      mountPath: /config
    resources:
      requests:
        cpu: "50m"
        memory: "64Mi"
      limits:
        cpu: "200m"
        memory: "128Mi"

  # 3. Set ownership and permissions on data directory
  - name: fix-permissions
    image: busybox:1.36
    command:
    - /bin/sh
    - -c
    - |
      chown -R 1000:1000 /data
      chmod 750 /data
      echo "Permissions set for uid 1000"
    securityContext:
      runAsUser: 0
    volumeMounts:
    - name: app-data
      mountPath: /data
    resources:
      requests:
        cpu: "10m"
        memory: "16Mi"
      limits:
        cpu: "50m"
        memory: "32Mi"

  # Main application container
  containers:
  - name: app
    image: web-application:3.1.0
    ports:
    - containerPort: 8080
      name: http
    securityContext:
      runAsUser: 1000
      runAsNonRoot: true
      readOnlyRootFilesystem: true
    volumeMounts:
    - name: config-volume
      mountPath: /etc/app
      readOnly: true
    - name: app-data
      mountPath: /data
    - name: tmp
      mountPath: /tmp
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "256Mi"

  volumes:
  - name: config-volume
    emptyDir: {}
  - name: app-data
    persistentVolumeClaim:
      claimName: app-data
  - name: tmp
    emptyDir:
      sizeLimit: "64Mi"
initpatternsstartup

Sidecar Pattern

Multi-container pod with Envoy reverse proxy sidecar sharing process namespace for signal proxying, plus a Fluent Bit log shipper reading from a shared volume

sidecar-pattern.yaml
# Envoy sidecar proxy alongside an application container
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-proxy
  namespace: production
  labels:
    app: app-with-proxy
spec:
  replicas: 3
  selector:
    matchLabels:
      app: app-with-proxy
  template:
    metadata:
      labels:
        app: app-with-proxy
    spec:
      shareProcessNamespace: true
      terminationGracePeriodSeconds: 30
      containers:
      # Primary application container
      - name: app
        image: web-application:3.1.0
        ports:
        - containerPort: 8080
          name: app-http
        volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
        - name: tmp
          mountPath: /tmp
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
          limits:
            cpu: "1"
            memory: "512Mi"
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - "kill -SIGTERM 1 && sleep 5"

      # Envoy sidecar proxy
      - name: envoy-proxy
        image: envoyproxy/envoy:v1.29-latest
        ports:
        - containerPort: 8443
          name: https
        - containerPort: 9901
          name: envoy-admin
        volumeMounts:
        - name: envoy-config
          mountPath: /etc/envoy
          readOnly: true
        resources:
          requests:
            cpu: "100m"
            memory: "64Mi"
          limits:
            cpu: "500m"
            memory: "128Mi"
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - "wget -qO- http://localhost:9901/healthcheck/fail && sleep 10"

      # Fluent Bit log sidecar
      - name: log-shipper
        image: fluent/fluent-bit:3.0
        volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
          readOnly: true
        resources:
          requests:
            cpu: "25m"
            memory: "32Mi"
          limits:
            cpu: "100m"
            memory: "64Mi"

      volumes:
      - name: envoy-config
        configMap:
          name: envoy-proxy-config
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-sidecar-config
      - name: app-logs
        emptyDir:
          sizeLimit: "256Mi"
      - name: tmp
        emptyDir:
          sizeLimit: "64Mi"
patternsproxylogging

Sealed Secrets

Encrypted secret storage for GitOps workflows using Bitnami Sealed Secrets, with namespace-scoped and cluster-wide sealing examples

sealed-secret.yaml
# Create a SealedSecret from the CLI:
#
# 1. Write a regular Secret manifest:
#    kubectl create secret generic db-credentials \
#      --from-literal=username=app_user \
#      --from-literal=password=s3cur3-pa55 \
#      --dry-run=client -o yaml > secret.yaml
#
# 2. Seal it (namespace-scoped, the default):
#    kubeseal --format yaml \
#      --controller-name=sealed-secrets \
#      --controller-namespace=kube-system \
#      < secret.yaml > sealed-secret.yaml
#
# 3. Seal it (cluster-wide, reusable across namespaces):
#    kubeseal --format yaml --scope cluster-wide \
#      --controller-name=sealed-secrets \
#      --controller-namespace=kube-system \
#      < secret.yaml > sealed-secret-cluster.yaml
#
# 4. Apply the SealedSecret (controller decrypts it):
#    kubectl apply -f sealed-secret.yaml

# Namespace-scoped SealedSecret (default)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-credentials
  namespace: production
  annotations:
    sealedsecrets.bitnami.com/managed: "true"
spec:
  encryptedData:
    username: AgBy3i4OJSWK+PiTySYZZA9rO...truncated...
    password: AgCtr8KJSWK+AiTtSYOZA7pQ...truncated...
  template:
    metadata:
      name: db-credentials
      namespace: production
      labels:
        app: web-application
    type: Opaque

---
# Cluster-wide SealedSecret (usable in any namespace)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: shared-api-key
  namespace: production
  annotations:
    sealedsecrets.bitnami.com/cluster-wide: "true"
spec:
  encryptedData:
    api-key: AgDf7kMNOPQR+XyZaBcDeF1gH...truncated...
    api-secret: AgHj9lSTUVWX+AbCdEfGhI2jK...truncated...
  template:
    metadata:
      name: shared-api-key
      annotations:
        sealedsecrets.bitnami.com/cluster-wide: "true"
    type: Opaque
secretssecuritygitops

VPA (VerticalPodAutoscaler)

Vertical pod autoscaling with recommendation-only and auto-update modes, bounded by container-level min/max resource policies

vpa.yaml
# Recommendation-only mode: observe suggestions without applying them
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-application-vpa-recommend
  namespace: production
  labels:
    app: web-application
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-application
  updatePolicy:
    updateMode: "Off"
  resourcePolicy:
    containerPolicies:
    - containerName: web-application
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits
      minAllowed:
        cpu: "50m"
        memory: "64Mi"
      maxAllowed:
        cpu: "2"
        memory: "2Gi"

---
# Auto mode: VPA evicts and resizes pods to match recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa-auto
  namespace: production
  labels:
    app: api-server
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2
  resourcePolicy:
    containerPolicies:
    - containerName: api-server
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "4Gi"
    # Exclude sidecar containers from VPA control
    - containerName: log-shipper
      mode: "Off"
scalingresourcesautoscaling

Kustomize Overlay

Full Kustomize base and overlay structure with configMapGenerator, secretGenerator, strategic merge patches, and JSON 6902 patches for multi-environment GitOps

kustomization.yaml
# base/kustomization.yaml
# Shared resources across all environments
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- deployment.yaml
- service.yaml
- networkpolicy.yaml
- hpa.yaml

commonLabels:
  app.kubernetes.io/managed-by: kustomize
  app.kubernetes.io/part-of: web-platform

configMapGenerator:
- name: app-config
  literals:
  - LOG_LEVEL=info
  - CACHE_TTL=300
  - METRICS_ENABLED=true

secretGenerator:
- name: app-tls
  files:
  - tls.crt=certs/tls.crt
  - tls.key=certs/tls.key
  type: kubernetes.io/tls

---
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ../../base

namespace: production
namePrefix: prod-

commonLabels:
  environment: production
  tier: frontend

replicas:
- name: web-application
  count: 5

configMapGenerator:
- name: app-config
  behavior: merge
  literals:
  - LOG_LEVEL=warn
  - CACHE_TTL=3600
  - DATABASE_HOST=postgres.database.svc.cluster.local

images:
- name: web-application
  newName: registry.example.com/web-application
  newTag: v2.4.0

patchesStrategicMerge:
- resource-limits.yaml
- tolerations.yaml

patchesJson6902:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: web-application
  patch: |-
    - op: add
      path: /spec/template/spec/containers/0/env/-
      value:
        name: ENVIRONMENT
        value: production
    - op: replace
      path: /spec/template/spec/containers/0/resources/limits/memory
      value: 1Gi

---
# overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ../../base

namespace: staging
namePrefix: stg-

commonLabels:
  environment: staging

replicas:
- name: web-application
  count: 2

configMapGenerator:
- name: app-config
  behavior: merge
  literals:
  - LOG_LEVEL=debug
  - DATABASE_HOST=postgres.staging.svc.cluster.local

images:
- name: web-application
  newName: registry.example.com/web-application
  newTag: v2.5.0-rc1
kustomizegitopsoverlays

Helm Charts

Curated Helm charts for deploying common applications and services on Kubernetes.

Traefik

Kubernetes Ingress Controller with automatic HTTPS

Repo: https://helm.traefik.io/traefik v26.0.0
values.yaml
# values.yaml for Traefik
deployment:
  replicas: 2

globalArguments:
- "--global.sendAnonymousUsage=false"
- "--global.checkNewVersion=false"

additionalArguments:
- "--log.level=DEBUG"
- "--accesslog=true"
- "--accesslog.format=json"
- "--entrypoints.websecure.http.tls.certResolver=letsencrypt"

ingressClass:
  enabled: true
  isDefaultClass: true

ports:
  web:
    port: 8000
    exposedPort: 80
    redirections:
      entryPoint:
        to: websecure
        scheme: https
        permanent: true
  websecure:
    port: 8443
    exposedPort: 443
    tls:
      enabled: true

certificatesResolvers:
  letsencrypt:
    acme:
      email: [email protected]
      storage: /data/acme.json
      dnsChallenge:
        provider: cloudflare
        resolvers:
        - "1.1.1.1:53"
        - "8.8.8.8:53"
        delayBeforeCheck: 30

providers:
  file:
    directory: /etc/traefik/dynamic
    watch: true
  kubernetesIngress:
    publishedService:
      enabled: true

metrics:
  prometheus:
    entryPoint: metrics
    addEntryPointsLabels: true
    addServicesLabels: true
    addRoutersLabels: true
    buckets: "0.1,0.3,1.2,5.0"

service:
  type: LoadBalancer

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "256Mi"

persistence:
  enabled: true
  size: 128Mi
  storageClass: longhorn

dashboard:
  enabled: true
  ingressRoute: true
ingressproxyhttps

Longhorn

Cloud-native distributed storage for Kubernetes

Repo: https://charts.longhorn.io v1.6.0
values.yaml
# values.yaml for Longhorn
persistence:
  defaultClass: true
  defaultClassReplicaCount: 2
  reclaimPolicy: Retain
  defaultFsType: ext4

defaultSettings:
  backupTarget: "nfs://10.42.0.50:/mnt/backups/longhorn"
  backupTargetCredentialSecret: ""
  defaultReplicaCount: 2
  storageMinimalAvailablePercentage: 15
  defaultDataLocality: best-effort
  autoDeletePodWhenVolumeDetachedUnexpectedly: true
  replicaSoftAntiAffinity: true
  storageOverProvisioningPercentage: 150
  guaranteedInstanceManagerCPU: 12
  concurrentAutomaticEngineUpgradePerNode: 1

csi:
  attacherReplicaCount: 2
  provisionerReplicaCount: 2
  snapshotterReplicaCount: 2

ingress:
  enabled: true
  ingressClassName: traefik
  host: longhorn.example.com
  tls: true
  tlsSecret: longhorn-tls

longhornUI:
  replicas: 2

recurringJobSelector:
  enable: true

# Apply recurring jobs after install via CRD:
# apiVersion: longhorn.io/v1beta2
# kind: RecurringJob
# metadata:
#   name: snapshot-hourly
#   namespace: longhorn-system
# spec:
#   cron: "0 * * * *"
#   task: snapshot
#   groups:
#   - default
#   retain: 24
#   concurrency: 2
#   labels:
#     type: hourly
#
# ---
# apiVersion: longhorn.io/v1beta2
# kind: RecurringJob
# metadata:
#   name: backup-daily
#   namespace: longhorn-system
# spec:
#   cron: "0 2 * * *"
#   task: backup
#   groups:
#   - default
#   retain: 14
#   concurrency: 1
#   labels:
#     type: daily
storagedistributedbackup

Prometheus Stack

Complete monitoring solution with Prometheus, Grafana, and Alertmanager

Repo: https://prometheus-community.github.io/helm-charts v56.0.0
values.yaml
# values.yaml for kube-prometheus-stack
prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          resources:
            requests:
              storage: 50Gi

grafana:
  adminPassword: changeme
  persistence:
    enabled: true
    size: 10Gi

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          resources:
            requests:
              storage: 5Gi
monitoringmetricsalerting

ArgoCD

Declarative GitOps continuous delivery for Kubernetes

Repo: https://argoproj.github.io/argo-helm v6.0.0
values.yaml
# values.yaml for argo-cd
server:
  replicas: 2
  ingress:
    enabled: true
    ingressClassName: traefik
    hosts:
    - argocd.example.com
    tls:
    - secretName: argocd-tls
      hosts:
      - argocd.example.com

controller:
  replicas: 1
  resources:
    requests:
      cpu: "250m"
      memory: "512Mi"
    limits:
      cpu: "1"
      memory: "1Gi"

repoServer:
  replicas: 2
  resources:
    requests:
      cpu: "100m"
      memory: "256Mi"

configs:
  repositories:
    private-repo:
      url: https://git.example.com/infra/k8s-manifests.git
      type: git
      passwordSecret:
        name: repo-credentials
        key: password
      usernameSecret:
        name: repo-credentials
        key: username

redis-ha:
  enabled: true
gitopscddeploymentargocd

CloudNativePG

Kubernetes operator for managing PostgreSQL clusters with built-in backup and failover

Repo: https://cloudnative-pg.github.io/charts v0.20.0
values.yaml
# values.yaml for cloudnative-pg operator
# Install the operator first, then create Cluster resources

# Operator configuration
crds:
  create: true

monitoring:
  podMonitorEnabled: true

# --- After operator install, create a Cluster CR ---
# apiVersion: postgresql.cnpg.io/v1
# kind: Cluster
# metadata:
#   name: app-database
#   namespace: production
# spec:
#   instances: 3
#   primaryUpdateStrategy: unsupervised
#
#   storage:
#     size: 50Gi
#     storageClass: longhorn
#
#   postgresql:
#     parameters:
#       shared_buffers: "256MB"
#       max_connections: "200"
#
#   backup:
#     barmanObjectStore:
#       destinationPath: "s3://backups/cnpg/"
#       s3Credentials:
#         accessKeyId:
#           name: backup-credentials
#           key: ACCESS_KEY_ID
#         secretAccessKey:
#           name: backup-credentials
#           key: SECRET_ACCESS_KEY
#     retentionPolicy: "30d"
#
#   monitoring:
#     enablePodMonitor: true
postgresqldatabaseoperatorha

Velero

Cluster backup and disaster recovery with scheduled snapshots and cross-cluster restore

Repo: https://vmware-tanzu.github.io/helm-charts v5.0.0
values.yaml
# values.yaml for velero
configuration:
  backupStorageLocation:
  - name: default
    provider: aws
    bucket: cluster-backups
    config:
      region: us-east-1
      s3ForcePathStyle: true
      s3Url: https://s3.example.com

  volumeSnapshotLocation:
  - name: default
    provider: aws
    config:
      region: us-east-1

credentials:
  secretContents:
    cloud: |
      [default]
      aws_access_key_id=YOUR_ACCESS_KEY
      aws_secret_access_key=YOUR_SECRET_KEY

schedules:
  daily-backup:
    disabled: false
    schedule: "0 3 * * *"
    template:
      ttl: "168h"
      includedNamespaces:
      - production
      - staging
      storageLocation: default
      volumeSnapshotLocations:
      - default

  weekly-full:
    disabled: false
    schedule: "0 1 * * 0"
    template:
      ttl: "720h"
      includedNamespaces:
      - "*"
      storageLocation: default

initContainers:
- name: velero-plugin-for-aws
  image: velero/velero-plugin-for-aws:v1.9.0
  volumeMounts:
  - mountPath: /target
    name: plugins
backupdisaster-recoveryrestoresnapshots

Cert-Manager

Automated TLS certificate lifecycle management with Let's Encrypt and private CA support

Repo: https://charts.jetstack.io v1.14.0
values.yaml
# values.yaml for cert-manager
installCRDs: true

replicaCount: 2

resources:
  requests:
    cpu: "50m"
    memory: "128Mi"
  limits:
    cpu: "200m"
    memory: "256Mi"

ingressShim:
  defaultIssuerName: letsencrypt-prod
  defaultIssuerKind: ClusterIssuer
  defaultIssuerGroup: cert-manager.io

prometheus:
  enabled: true
  servicemonitor:
    enabled: true

webhook:
  replicaCount: 2
  resources:
    requests:
      cpu: "25m"
      memory: "32Mi"

cainjector:
  replicaCount: 1
  resources:
    requests:
      cpu: "25m"
      memory: "64Mi"
certificatestlsletsencryptautomation

Kubernetes Operators

Examples and guides for using Kubernetes Operators to automate application lifecycle management.

Cert-Manager

Automate certificate management in Kubernetes

Install: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
example-usage.yaml
# ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik
certificatestlsautomation

External Secrets

Sync secrets from external secret stores (Vault, AWS, etc.)

Install: helm install external-secrets external-secrets/external-secrets -n external-secrets --create-namespace
example-usage.yaml
# ExternalSecret syncing from Vault
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: app-secrets
  data:
  - secretKey: database-password
    remoteRef:
      key: secret/data/production/db
      property: password
secretsvaultsecurity

Sealed Secrets Controller

Encrypt Kubernetes Secrets into SealedSecrets safe for storage in Git, decrypted only by the in-cluster controller

Install: helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system --set-string fullnameOverride=sealed-secrets-controller
example-usage.yaml
# Install kubeseal CLI:
# brew install kubeseal  (macOS)
# wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.27.0/kubeseal-0.27.0-linux-amd64.tar.gz
# tar xfz kubeseal-*.tar.gz && install -m 755 kubeseal /usr/local/bin/kubeseal

# Fetch the controller's public key (for offline sealing):
# kubeseal --fetch-cert --controller-name=sealed-secrets-controller \
#   --controller-namespace=kube-system > pub-cert.pem

# Create and seal a secret:
# kubectl create secret generic db-creds \
#   --from-literal=password=hunter2 \
#   --dry-run=client -o yaml | \
#   kubeseal --format yaml --cert pub-cert.pem > sealed-db-creds.yaml

apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-creds
  namespace: production
spec:
  encryptedData:
    password: AgBy3i4OJSWK+PiTySYZZA9rO43cGDnR...
  template:
    metadata:
      name: db-creds
      namespace: production
      labels:
        app: web-application
    type: Opaque
securitygitops

Kyverno Policy Engine

Kubernetes-native policy management for admission control, mutation, and resource validation without learning a new language

Install: helm install kyverno kyverno/kyverno -n kyverno --create-namespace --set replicaCount=3
example-usage.yaml
# ClusterPolicy: require labels, block latest tag, enforce resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
  annotations:
    policies.kyverno.io/title: Require Labels
    policies.kyverno.io/severity: medium
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: check-required-labels
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Labels 'app' and 'owner' are required on all Pods."
      pattern:
        metadata:
          labels:
            app: "?*"
            owner: "?*"

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-latest-tag
spec:
  validationFailureAction: Enforce
  background: true
  rules:
  - name: validate-image-tag
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "Using 'latest' tag is not allowed. Pin to a specific version."
      pattern:
        spec:
          containers:
          - image: "!*:latest"

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Audit
  background: true
  rules:
  - name: check-resource-limits
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "CPU and memory limits are required for all containers."
      pattern:
        spec:
          containers:
          - resources:
              limits:
                memory: "?*"
                cpu: "?*"
policygovernancesecurity

Kubernetes Best Practices

Tips, tricks, and best practices for managing Kubernetes clusters effectively and securely.

  • Resource Requests and Limits

    Always define CPU and memory requests/limits to ensure fair scheduling and prevent resource starvation.

    • Set requests to typical usage, limits to maximum acceptable burst
    • Use LimitRange to enforce defaults namespace-wide so no pod runs unbounded
    • Monitor actual usage with Prometheus and tune values quarterly
    • Avoid setting CPU limits too tight — CPU throttling degrades latency without killing the pod
    • Memory limits should be hard: OOM kills are better than node-level memory pressure
    • Use VPA in recommendation mode to gather data before committing to values
  • Security Contexts

    Run containers as non-root users and use read-only filesystems where possible.

    • Set runAsNonRoot: true in the pod security context to block root containers at admission
    • Use readOnlyRootFilesystem: true and mount writable emptyDir volumes only where needed
    • Drop all Linux capabilities with drop: ["ALL"] and add back only what the process requires
    • Set allowPrivilegeEscalation: false to prevent child processes from gaining more privileges
    • Use seccompProfile type RuntimeDefault to apply the container runtime's default syscall filter
    • Assign a non-zero runAsUser and runAsGroup to avoid UID 0 even inside distroless images
  • Health Probes

    Configure liveness, readiness, and startup probes for reliable deployments.

    • Readiness probes gate traffic: a failing readiness probe removes the pod from Service endpoints
    • Liveness probes trigger restarts: use them to recover from deadlocks, not slow responses
    • Startup probes run first and disable liveness/readiness until the app is initialized
    • Set initialDelaySeconds high enough to survive cold starts but low enough to detect real failures
    • Never point liveness probes at endpoints that depend on downstream services — use a local /healthz
    • Use different paths for readiness (/ready) and liveness (/healthz) so they can fail independently
  • Network Policies

    Implement network segmentation to control traffic between pods and namespaces.

    • Start with default-deny for both ingress and egress in every namespace
    • Explicitly allow DNS egress (UDP/TCP 53) or pods cannot resolve service names
    • Separate concerns by namespace (frontend, backend, database) and restrict cross-namespace traffic
    • Use namespaceSelector with labels to allow traffic from specific namespaces like the ingress controller
    • Test policies with a curl pod before enforcing — a misconfigured policy can take down an entire namespace
    • Label namespaces consistently (e.g., kubernetes.io/metadata.name) to make selectors predictable
  • Pod Scheduling

    Control where pods land using affinity rules, topology spread constraints, and node selectors for high availability.

    • Spread replicas across availability zones with topologySpreadConstraints and maxSkew: 1
    • Use pod anti-affinity to prevent multiple replicas of the same app on one node
    • Prefer soft (preferred) anti-affinity over hard (required) to avoid unschedulable pods in small clusters
    • Use node selectors for simple hardware requirements like GPU or SSD-backed nodes
    • Combine topology spread with PodDisruptionBudget to survive zone failures and rolling upgrades
    • Set whenUnsatisfiable: ScheduleAnyway for node-level spread so the scheduler degrades gracefully
  • Secret Management

    Keep sensitive data out of manifests and version control using external secret stores and encryption at rest.

    • Never commit plain Kubernetes Secrets or credentials to git repositories
    • Use External Secrets Operator to sync from HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault
    • Use Sealed Secrets as an alternative when you need encrypted secrets stored directly in git
    • Rotate secrets on a regular schedule and audit access with RBAC and audit logging
    • Enable encryption at rest for etcd so secrets on disk are not stored in plaintext
    • Mount secrets as files instead of environment variables to reduce exposure in process listings and crash dumps
  • Namespace Organization

    Structure cluster tenancy with namespaces per team or environment, enforced with quotas and default-deny policies.

    • Use labels like team, environment, and cost-center on every namespace for ownership and chargeback
    • Separate production, staging, and development into distinct namespaces with different quota limits
    • Apply a default-deny NetworkPolicy in each namespace and explicitly allow required traffic
    • Set LimitRange defaults so containers without resource specs get bounded automatically
    • Apply ResourceQuota to cap total CPU, memory, and object counts per namespace
    • Automate namespace provisioning with a controller or Helm chart so every namespace ships with policies pre-applied