Kubernetes Resources
Kubernetes YAML Manifests
A collection of production-ready Kubernetes manifest examples for various workload types and configuration patterns.
Deployment
Standard Deployment with resource limits, probes, and environment variables
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-application
namespace: production
labels:
app: web-application
spec:
replicas: 3
selector:
matchLabels:
app: web-application
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: web-application
spec:
containers:
- name: web-application
image: nginx:1.25-alpine
ports:
- containerPort: 80
name: http
resources:
limits:
cpu: "500m"
memory: "512Mi"
requests:
cpu: "100m"
memory: "128Mi"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5 Service (ClusterIP)
Internal service for pod-to-pod communication within the cluster
apiVersion: v1
kind: Service
metadata:
name: web-application
namespace: production
labels:
app: web-application
spec:
type: ClusterIP
selector:
app: web-application
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP Ingress (Traefik)
Ingress resource for Traefik with TLS and middleware
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-application
namespace: production
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
traefik.ingress.kubernetes.io/router.middlewares: production-redirect@kubernetescrd
spec:
ingressClassName: traefik
tls:
- hosts:
- app.example.com
secretName: app-tls-secret
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-application
port:
number: 80 ConfigMap
Configuration data for application settings and environment variables
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
# Simple key-value pairs
DATABASE_HOST: "postgres.database.svc.cluster.local"
DATABASE_PORT: "5432"
LOG_LEVEL: "info"
# File-based configuration
nginx.conf: |
server {
listen 80;
server_name _;
location / {
root /usr/share/nginx/html;
index index.html;
try_files $uri $uri/ /index.html;
}
location /api {
proxy_pass http://backend:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
} Secret
Secure storage for sensitive data like passwords and API keys
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: production
type: Opaque
stringData:
# Use stringData for plain text (auto-encoded to base64)
DATABASE_USER: "app_user"
DATABASE_PASSWORD: "secure-password-here"
API_KEY: "your-api-key"
---
# For TLS certificates
apiVersion: v1
kind: Secret
metadata:
name: app-tls-secret
namespace: production
type: kubernetes.io/tls
data:
# Base64-encoded certificate and key
tls.crt: LS0tLS1CRUdJTi...
tls.key: LS0tLS1CRUdJTi... PersistentVolumeClaim
Storage request for persistent data with Longhorn storage class
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
---
# For shared storage (ReadWriteMany)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
namespace: production
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn-nfs
resources:
requests:
storage: 50Gi NetworkPolicy
Network segmentation to control pod-to-pod traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: web-application
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: traefik
ports:
- protocol: TCP
port: 80
# Allow traffic from same namespace
- from:
- podSelector: {}
egress:
# Allow DNS resolution
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow database access
- to:
- namespaceSelector:
matchLabels:
name: database
ports:
- protocol: TCP
port: 5432 HorizontalPodAutoscaler
Auto-scaling based on CPU and memory utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-application-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-application
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15 StatefulSet
Stateful workload with stable network identities, ordered deployment, and persistent storage via volumeClaimTemplates
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
namespace: production
labels:
app: postgres
spec:
clusterIP: None
selector:
app: postgres
ports:
- name: tcp-postgres
port: 5432
targetPort: 5432
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: production
labels:
app: postgres
spec:
serviceName: postgres-headless
replicas: 3
selector:
matchLabels:
app: postgres
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: postgres
spec:
securityContext:
runAsNonRoot: true
runAsUser: 999
fsGroup: 999
containers:
- name: postgres
image: postgres:16-alpine
ports:
- containerPort: 5432
name: tcp-postgres
env:
- name: POSTGRES_DB
value: "appdb"
- name: PGDATA
value: "/var/lib/postgresql/data/pgdata"
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
livenessProbe:
exec:
command:
- pg_isready
- -U
- $$(POSTGRES_USER)
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- $$(POSTGRES_USER)
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 20Gi DaemonSet
Node-level agent deployed to every node in the cluster, with tolerations for control-plane scheduling
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-collector
namespace: monitoring
labels:
app: log-collector
spec:
selector:
matchLabels:
app: log-collector
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: log-collector
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: log-collector
containers:
- name: log-collector
image: fluent/fluent-bit:3.0
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "256Mi"
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config
mountPath: /fluent-bit/etc/
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers
- name: config
configMap:
name: log-collector-config Job and CronJob
One-time batch Job for database migration and a scheduled CronJob for nightly backups
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: production
labels:
app: db-migration
spec:
backoffLimit: 3
activeDeadlineSeconds: 600
ttlSecondsAfterFinished: 86400
template:
metadata:
labels:
app: db-migration
spec:
restartPolicy: OnFailure
containers:
- name: migrate
image: app-migrations:1.2.0
command: ["./migrate", "--direction=up"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: postgres-credentials
key: connection-string
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
namespace: production
labels:
app: nightly-backup
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 3
jobTemplate:
spec:
backoffLimit: 2
template:
metadata:
labels:
app: nightly-backup
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: postgres:16-alpine
command:
- /bin/sh
- -c
- |
pg_dump -h postgres-headless \
-U "$PGUSER" -d appdb \
-F c -f /backups/backup-$(date +%Y%m%d).dump
env:
- name: PGUSER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
volumeMounts:
- name: backup-storage
mountPath: /backups
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-data RBAC
Least-privilege access control with ServiceAccount, Role, and RoleBinding scoped to a single namespace
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-service-account
namespace: production
labels:
app: web-application
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: app-role
namespace: production
labels:
app: web-application
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["app-secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["get", "create", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: app-role-binding
namespace: production
labels:
app: web-application
subjects:
- kind: ServiceAccount
name: app-service-account
namespace: production
roleRef:
kind: Role
name: app-role
apiGroup: rbac.authorization.k8s.io PodDisruptionBudget
Maintain minimum availability during voluntary disruptions like node drains and cluster upgrades
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-application-pdb
namespace: production
labels:
app: web-application
spec:
minAvailable: 2
selector:
matchLabels:
app: web-application
unhealthyPodEvictionPolicy: IfHealthy
---
# Alternative: percentage-based PDB for larger deployments
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: production
labels:
app: postgres
spec:
maxUnavailable: 1
selector:
matchLabels:
app: postgres ResourceQuota + LimitRange
Namespace-level guardrails for CPU, memory, and object counts with default container limits
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "8"
requests.memory: "16Gi"
limits.cpu: "16"
limits.memory: "32Gi"
pods: "40"
persistentvolumeclaims: "20"
services: "15"
secrets: "30"
configmaps: "30"
---
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "50m"
memory: "64Mi"
- type: PersistentVolumeClaim
max:
storage: "100Gi"
min:
storage: "1Gi" ServiceMonitor
Prometheus Operator CRD for automatic metrics scrape target discovery based on label selectors
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: web-application
namespace: production
labels:
app: web-application
release: prometheus
spec:
selector:
matchLabels:
app: web-application
namespaceSelector:
matchNames:
- production
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
honorLabels: true
- port: http
path: /metrics/detailed
interval: 60s
scrapeTimeout: 15s
---
# PrometheusRule for alerting on the same application
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: web-application-alerts
namespace: production
labels:
app: web-application
release: prometheus
spec:
groups:
- name: web-application.rules
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{
job="web-application",
status=~"5.."
}[5m])) /
sum(rate(http_requests_total{
job="web-application"
}[5m])) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High 5xx error rate on web-application"
description: "Error rate is above 5% for 5 minutes"
- alert: PodRestartLooping
expr: |
increase(kube_pod_container_status_restarts_total{
namespace="production",
pod=~"web-application.*"
}[1h]) > 3
for: 10m
labels:
severity: critical
annotations:
summary: "Pod restart loop detected"
description: "Pod {{ $labels.pod }} restarted more than 3 times in the last hour" TopologySpreadConstraints
Multi-zone pod spreading with topology constraints and anti-affinity to guarantee even distribution across failure domains
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
labels:
app: api-server
spec:
replicas: 6
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
topologySpreadConstraints:
# Spread evenly across availability zones
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-server
# Spread evenly across individual nodes within each zone
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api-server
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-server
topologyKey: kubernetes.io/hostname
containers:
- name: api-server
image: api-server:2.4.0
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
readinessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 15
periodSeconds: 20 Init Containers
Pod initialization pattern with sequential init containers for dependency checks, config fetching, and filesystem preparation before the main application starts
apiVersion: v1
kind: Pod
metadata:
name: app-with-init
namespace: production
labels:
app: web-application
spec:
initContainers:
# 1. Wait for the database to become resolvable via DNS
- name: wait-for-db
image: busybox:1.36
command:
- /bin/sh
- -c
- |
echo "Waiting for postgres to be ready..."
until nslookup postgres.database.svc.cluster.local; do
echo "postgres not ready - sleeping 2s"
sleep 2
done
echo "postgres is reachable"
resources:
requests:
cpu: "10m"
memory: "16Mi"
limits:
cpu: "50m"
memory: "32Mi"
# 2. Download application config from S3
- name: fetch-config
image: amazon/aws-cli:2.15
command:
- /bin/sh
- -c
- |
aws s3 cp s3://config-bucket/production/app.conf /config/app.conf
aws s3 cp s3://config-bucket/production/features.json /config/features.json
echo "Config files downloaded"
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: s3-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: s3-credentials
key: secret-key
- name: AWS_DEFAULT_REGION
value: "us-east-1"
volumeMounts:
- name: config-volume
mountPath: /config
resources:
requests:
cpu: "50m"
memory: "64Mi"
limits:
cpu: "200m"
memory: "128Mi"
# 3. Set ownership and permissions on data directory
- name: fix-permissions
image: busybox:1.36
command:
- /bin/sh
- -c
- |
chown -R 1000:1000 /data
chmod 750 /data
echo "Permissions set for uid 1000"
securityContext:
runAsUser: 0
volumeMounts:
- name: app-data
mountPath: /data
resources:
requests:
cpu: "10m"
memory: "16Mi"
limits:
cpu: "50m"
memory: "32Mi"
# Main application container
containers:
- name: app
image: web-application:3.1.0
ports:
- containerPort: 8080
name: http
securityContext:
runAsUser: 1000
runAsNonRoot: true
readOnlyRootFilesystem: true
volumeMounts:
- name: config-volume
mountPath: /etc/app
readOnly: true
- name: app-data
mountPath: /data
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
volumes:
- name: config-volume
emptyDir: {}
- name: app-data
persistentVolumeClaim:
claimName: app-data
- name: tmp
emptyDir:
sizeLimit: "64Mi" Sidecar Pattern
Multi-container pod with Envoy reverse proxy sidecar sharing process namespace for signal proxying, plus a Fluent Bit log shipper reading from a shared volume
# Envoy sidecar proxy alongside an application container
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-proxy
namespace: production
labels:
app: app-with-proxy
spec:
replicas: 3
selector:
matchLabels:
app: app-with-proxy
template:
metadata:
labels:
app: app-with-proxy
spec:
shareProcessNamespace: true
terminationGracePeriodSeconds: 30
containers:
# Primary application container
- name: app
image: web-application:3.1.0
ports:
- containerPort: 8080
name: app-http
volumeMounts:
- name: app-logs
mountPath: /var/log/app
- name: tmp
mountPath: /tmp
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "512Mi"
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "kill -SIGTERM 1 && sleep 5"
# Envoy sidecar proxy
- name: envoy-proxy
image: envoyproxy/envoy:v1.29-latest
ports:
- containerPort: 8443
name: https
- containerPort: 9901
name: envoy-admin
volumeMounts:
- name: envoy-config
mountPath: /etc/envoy
readOnly: true
resources:
requests:
cpu: "100m"
memory: "64Mi"
limits:
cpu: "500m"
memory: "128Mi"
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "wget -qO- http://localhost:9901/healthcheck/fail && sleep 10"
# Fluent Bit log sidecar
- name: log-shipper
image: fluent/fluent-bit:3.0
volumeMounts:
- name: app-logs
mountPath: /var/log/app
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
readOnly: true
resources:
requests:
cpu: "25m"
memory: "32Mi"
limits:
cpu: "100m"
memory: "64Mi"
volumes:
- name: envoy-config
configMap:
name: envoy-proxy-config
- name: fluent-bit-config
configMap:
name: fluent-bit-sidecar-config
- name: app-logs
emptyDir:
sizeLimit: "256Mi"
- name: tmp
emptyDir:
sizeLimit: "64Mi" Sealed Secrets
Encrypted secret storage for GitOps workflows using Bitnami Sealed Secrets, with namespace-scoped and cluster-wide sealing examples
# Create a SealedSecret from the CLI:
#
# 1. Write a regular Secret manifest:
# kubectl create secret generic db-credentials \
# --from-literal=username=app_user \
# --from-literal=password=s3cur3-pa55 \
# --dry-run=client -o yaml > secret.yaml
#
# 2. Seal it (namespace-scoped, the default):
# kubeseal --format yaml \
# --controller-name=sealed-secrets \
# --controller-namespace=kube-system \
# < secret.yaml > sealed-secret.yaml
#
# 3. Seal it (cluster-wide, reusable across namespaces):
# kubeseal --format yaml --scope cluster-wide \
# --controller-name=sealed-secrets \
# --controller-namespace=kube-system \
# < secret.yaml > sealed-secret-cluster.yaml
#
# 4. Apply the SealedSecret (controller decrypts it):
# kubectl apply -f sealed-secret.yaml
# Namespace-scoped SealedSecret (default)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-credentials
namespace: production
annotations:
sealedsecrets.bitnami.com/managed: "true"
spec:
encryptedData:
username: AgBy3i4OJSWK+PiTySYZZA9rO...truncated...
password: AgCtr8KJSWK+AiTtSYOZA7pQ...truncated...
template:
metadata:
name: db-credentials
namespace: production
labels:
app: web-application
type: Opaque
---
# Cluster-wide SealedSecret (usable in any namespace)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: shared-api-key
namespace: production
annotations:
sealedsecrets.bitnami.com/cluster-wide: "true"
spec:
encryptedData:
api-key: AgDf7kMNOPQR+XyZaBcDeF1gH...truncated...
api-secret: AgHj9lSTUVWX+AbCdEfGhI2jK...truncated...
template:
metadata:
name: shared-api-key
annotations:
sealedsecrets.bitnami.com/cluster-wide: "true"
type: Opaque VPA (VerticalPodAutoscaler)
Vertical pod autoscaling with recommendation-only and auto-update modes, bounded by container-level min/max resource policies
# Recommendation-only mode: observe suggestions without applying them
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-application-vpa-recommend
namespace: production
labels:
app: web-application
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-application
updatePolicy:
updateMode: "Off"
resourcePolicy:
containerPolicies:
- containerName: web-application
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2"
memory: "2Gi"
---
# Auto mode: VPA evicts and resizes pods to match recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa-auto
namespace: production
labels:
app: api-server
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Auto"
minReplicas: 2
resourcePolicy:
containerPolicies:
- containerName: api-server
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "4Gi"
# Exclude sidecar containers from VPA control
- containerName: log-shipper
mode: "Off" Kustomize Overlay
Full Kustomize base and overlay structure with configMapGenerator, secretGenerator, strategic merge patches, and JSON 6902 patches for multi-environment GitOps
# base/kustomization.yaml
# Shared resources across all environments
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- networkpolicy.yaml
- hpa.yaml
commonLabels:
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/part-of: web-platform
configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=info
- CACHE_TTL=300
- METRICS_ENABLED=true
secretGenerator:
- name: app-tls
files:
- tls.crt=certs/tls.crt
- tls.key=certs/tls.key
type: kubernetes.io/tls
---
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: production
namePrefix: prod-
commonLabels:
environment: production
tier: frontend
replicas:
- name: web-application
count: 5
configMapGenerator:
- name: app-config
behavior: merge
literals:
- LOG_LEVEL=warn
- CACHE_TTL=3600
- DATABASE_HOST=postgres.database.svc.cluster.local
images:
- name: web-application
newName: registry.example.com/web-application
newTag: v2.4.0
patchesStrategicMerge:
- resource-limits.yaml
- tolerations.yaml
patchesJson6902:
- target:
group: apps
version: v1
kind: Deployment
name: web-application
patch: |-
- op: add
path: /spec/template/spec/containers/0/env/-
value:
name: ENVIRONMENT
value: production
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 1Gi
---
# overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: staging
namePrefix: stg-
commonLabels:
environment: staging
replicas:
- name: web-application
count: 2
configMapGenerator:
- name: app-config
behavior: merge
literals:
- LOG_LEVEL=debug
- DATABASE_HOST=postgres.staging.svc.cluster.local
images:
- name: web-application
newName: registry.example.com/web-application
newTag: v2.5.0-rc1 Helm Charts
Curated Helm charts for deploying common applications and services on Kubernetes.
Traefik
Kubernetes Ingress Controller with automatic HTTPS
# values.yaml for Traefik
deployment:
replicas: 2
globalArguments:
- "--global.sendAnonymousUsage=false"
- "--global.checkNewVersion=false"
additionalArguments:
- "--log.level=DEBUG"
- "--accesslog=true"
- "--accesslog.format=json"
- "--entrypoints.websecure.http.tls.certResolver=letsencrypt"
ingressClass:
enabled: true
isDefaultClass: true
ports:
web:
port: 8000
exposedPort: 80
redirections:
entryPoint:
to: websecure
scheme: https
permanent: true
websecure:
port: 8443
exposedPort: 443
tls:
enabled: true
certificatesResolvers:
letsencrypt:
acme:
email: [email protected]
storage: /data/acme.json
dnsChallenge:
provider: cloudflare
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
delayBeforeCheck: 30
providers:
file:
directory: /etc/traefik/dynamic
watch: true
kubernetesIngress:
publishedService:
enabled: true
metrics:
prometheus:
entryPoint: metrics
addEntryPointsLabels: true
addServicesLabels: true
addRoutersLabels: true
buckets: "0.1,0.3,1.2,5.0"
service:
type: LoadBalancer
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
persistence:
enabled: true
size: 128Mi
storageClass: longhorn
dashboard:
enabled: true
ingressRoute: true Longhorn
Cloud-native distributed storage for Kubernetes
# values.yaml for Longhorn
persistence:
defaultClass: true
defaultClassReplicaCount: 2
reclaimPolicy: Retain
defaultFsType: ext4
defaultSettings:
backupTarget: "nfs://10.42.0.50:/mnt/backups/longhorn"
backupTargetCredentialSecret: ""
defaultReplicaCount: 2
storageMinimalAvailablePercentage: 15
defaultDataLocality: best-effort
autoDeletePodWhenVolumeDetachedUnexpectedly: true
replicaSoftAntiAffinity: true
storageOverProvisioningPercentage: 150
guaranteedInstanceManagerCPU: 12
concurrentAutomaticEngineUpgradePerNode: 1
csi:
attacherReplicaCount: 2
provisionerReplicaCount: 2
snapshotterReplicaCount: 2
ingress:
enabled: true
ingressClassName: traefik
host: longhorn.example.com
tls: true
tlsSecret: longhorn-tls
longhornUI:
replicas: 2
recurringJobSelector:
enable: true
# Apply recurring jobs after install via CRD:
# apiVersion: longhorn.io/v1beta2
# kind: RecurringJob
# metadata:
# name: snapshot-hourly
# namespace: longhorn-system
# spec:
# cron: "0 * * * *"
# task: snapshot
# groups:
# - default
# retain: 24
# concurrency: 2
# labels:
# type: hourly
#
# ---
# apiVersion: longhorn.io/v1beta2
# kind: RecurringJob
# metadata:
# name: backup-daily
# namespace: longhorn-system
# spec:
# cron: "0 2 * * *"
# task: backup
# groups:
# - default
# retain: 14
# concurrency: 1
# labels:
# type: daily Prometheus Stack
Complete monitoring solution with Prometheus, Grafana, and Alertmanager
# values.yaml for kube-prometheus-stack
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 50Gi
grafana:
adminPassword: changeme
persistence:
enabled: true
size: 10Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 5Gi ArgoCD
Declarative GitOps continuous delivery for Kubernetes
# values.yaml for argo-cd
server:
replicas: 2
ingress:
enabled: true
ingressClassName: traefik
hosts:
- argocd.example.com
tls:
- secretName: argocd-tls
hosts:
- argocd.example.com
controller:
replicas: 1
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
repoServer:
replicas: 2
resources:
requests:
cpu: "100m"
memory: "256Mi"
configs:
repositories:
private-repo:
url: https://git.example.com/infra/k8s-manifests.git
type: git
passwordSecret:
name: repo-credentials
key: password
usernameSecret:
name: repo-credentials
key: username
redis-ha:
enabled: true CloudNativePG
Kubernetes operator for managing PostgreSQL clusters with built-in backup and failover
# values.yaml for cloudnative-pg operator
# Install the operator first, then create Cluster resources
# Operator configuration
crds:
create: true
monitoring:
podMonitorEnabled: true
# --- After operator install, create a Cluster CR ---
# apiVersion: postgresql.cnpg.io/v1
# kind: Cluster
# metadata:
# name: app-database
# namespace: production
# spec:
# instances: 3
# primaryUpdateStrategy: unsupervised
#
# storage:
# size: 50Gi
# storageClass: longhorn
#
# postgresql:
# parameters:
# shared_buffers: "256MB"
# max_connections: "200"
#
# backup:
# barmanObjectStore:
# destinationPath: "s3://backups/cnpg/"
# s3Credentials:
# accessKeyId:
# name: backup-credentials
# key: ACCESS_KEY_ID
# secretAccessKey:
# name: backup-credentials
# key: SECRET_ACCESS_KEY
# retentionPolicy: "30d"
#
# monitoring:
# enablePodMonitor: true Velero
Cluster backup and disaster recovery with scheduled snapshots and cross-cluster restore
# values.yaml for velero
configuration:
backupStorageLocation:
- name: default
provider: aws
bucket: cluster-backups
config:
region: us-east-1
s3ForcePathStyle: true
s3Url: https://s3.example.com
volumeSnapshotLocation:
- name: default
provider: aws
config:
region: us-east-1
credentials:
secretContents:
cloud: |
[default]
aws_access_key_id=YOUR_ACCESS_KEY
aws_secret_access_key=YOUR_SECRET_KEY
schedules:
daily-backup:
disabled: false
schedule: "0 3 * * *"
template:
ttl: "168h"
includedNamespaces:
- production
- staging
storageLocation: default
volumeSnapshotLocations:
- default
weekly-full:
disabled: false
schedule: "0 1 * * 0"
template:
ttl: "720h"
includedNamespaces:
- "*"
storageLocation: default
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.9.0
volumeMounts:
- mountPath: /target
name: plugins Cert-Manager
Automated TLS certificate lifecycle management with Let's Encrypt and private CA support
# values.yaml for cert-manager
installCRDs: true
replicaCount: 2
resources:
requests:
cpu: "50m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
ingressShim:
defaultIssuerName: letsencrypt-prod
defaultIssuerKind: ClusterIssuer
defaultIssuerGroup: cert-manager.io
prometheus:
enabled: true
servicemonitor:
enabled: true
webhook:
replicaCount: 2
resources:
requests:
cpu: "25m"
memory: "32Mi"
cainjector:
replicaCount: 1
resources:
requests:
cpu: "25m"
memory: "64Mi" Kubernetes Operators
Examples and guides for using Kubernetes Operators to automate application lifecycle management.
Cert-Manager
Automate certificate management in Kubernetes
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml # ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: traefik External Secrets
Sync secrets from external secret stores (Vault, AWS, etc.)
helm install external-secrets external-secrets/external-secrets -n external-secrets --create-namespace # ExternalSecret syncing from Vault
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: app-secrets
data:
- secretKey: database-password
remoteRef:
key: secret/data/production/db
property: password Sealed Secrets Controller
Encrypt Kubernetes Secrets into SealedSecrets safe for storage in Git, decrypted only by the in-cluster controller
helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system --set-string fullnameOverride=sealed-secrets-controller # Install kubeseal CLI:
# brew install kubeseal (macOS)
# wget https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.27.0/kubeseal-0.27.0-linux-amd64.tar.gz
# tar xfz kubeseal-*.tar.gz && install -m 755 kubeseal /usr/local/bin/kubeseal
# Fetch the controller's public key (for offline sealing):
# kubeseal --fetch-cert --controller-name=sealed-secrets-controller \
# --controller-namespace=kube-system > pub-cert.pem
# Create and seal a secret:
# kubectl create secret generic db-creds \
# --from-literal=password=hunter2 \
# --dry-run=client -o yaml | \
# kubeseal --format yaml --cert pub-cert.pem > sealed-db-creds.yaml
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-creds
namespace: production
spec:
encryptedData:
password: AgBy3i4OJSWK+PiTySYZZA9rO43cGDnR...
template:
metadata:
name: db-creds
namespace: production
labels:
app: web-application
type: Opaque Kyverno Policy Engine
Kubernetes-native policy management for admission control, mutation, and resource validation without learning a new language
helm install kyverno kyverno/kyverno -n kyverno --create-namespace --set replicaCount=3 # ClusterPolicy: require labels, block latest tag, enforce resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
annotations:
policies.kyverno.io/title: Require Labels
policies.kyverno.io/severity: medium
spec:
validationFailureAction: Enforce
background: true
rules:
- name: check-required-labels
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Labels 'app' and 'owner' are required on all Pods."
pattern:
metadata:
labels:
app: "?*"
owner: "?*"
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-latest-tag
spec:
validationFailureAction: Enforce
background: true
rules:
- name: validate-image-tag
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Using 'latest' tag is not allowed. Pin to a specific version."
pattern:
spec:
containers:
- image: "!*:latest"
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Audit
background: true
rules:
- name: check-resource-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory limits are required for all containers."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*" Kubernetes Best Practices
Tips, tricks, and best practices for managing Kubernetes clusters effectively and securely.
-
Resource Requests and Limits
Always define CPU and memory requests/limits to ensure fair scheduling and prevent resource starvation.
- Set requests to typical usage, limits to maximum acceptable burst
- Use LimitRange to enforce defaults namespace-wide so no pod runs unbounded
- Monitor actual usage with Prometheus and tune values quarterly
- Avoid setting CPU limits too tight — CPU throttling degrades latency without killing the pod
- Memory limits should be hard: OOM kills are better than node-level memory pressure
- Use VPA in recommendation mode to gather data before committing to values
-
Security Contexts
Run containers as non-root users and use read-only filesystems where possible.
- Set runAsNonRoot: true in the pod security context to block root containers at admission
- Use readOnlyRootFilesystem: true and mount writable emptyDir volumes only where needed
- Drop all Linux capabilities with drop: ["ALL"] and add back only what the process requires
- Set allowPrivilegeEscalation: false to prevent child processes from gaining more privileges
- Use seccompProfile type RuntimeDefault to apply the container runtime's default syscall filter
- Assign a non-zero runAsUser and runAsGroup to avoid UID 0 even inside distroless images
-
Health Probes
Configure liveness, readiness, and startup probes for reliable deployments.
- Readiness probes gate traffic: a failing readiness probe removes the pod from Service endpoints
- Liveness probes trigger restarts: use them to recover from deadlocks, not slow responses
- Startup probes run first and disable liveness/readiness until the app is initialized
- Set initialDelaySeconds high enough to survive cold starts but low enough to detect real failures
- Never point liveness probes at endpoints that depend on downstream services — use a local /healthz
- Use different paths for readiness (/ready) and liveness (/healthz) so they can fail independently
-
Network Policies
Implement network segmentation to control traffic between pods and namespaces.
- Start with default-deny for both ingress and egress in every namespace
- Explicitly allow DNS egress (UDP/TCP 53) or pods cannot resolve service names
- Separate concerns by namespace (frontend, backend, database) and restrict cross-namespace traffic
- Use namespaceSelector with labels to allow traffic from specific namespaces like the ingress controller
- Test policies with a curl pod before enforcing — a misconfigured policy can take down an entire namespace
- Label namespaces consistently (e.g., kubernetes.io/metadata.name) to make selectors predictable
-
Pod Scheduling
Control where pods land using affinity rules, topology spread constraints, and node selectors for high availability.
- Spread replicas across availability zones with topologySpreadConstraints and maxSkew: 1
- Use pod anti-affinity to prevent multiple replicas of the same app on one node
- Prefer soft (preferred) anti-affinity over hard (required) to avoid unschedulable pods in small clusters
- Use node selectors for simple hardware requirements like GPU or SSD-backed nodes
- Combine topology spread with PodDisruptionBudget to survive zone failures and rolling upgrades
- Set whenUnsatisfiable: ScheduleAnyway for node-level spread so the scheduler degrades gracefully
-
Secret Management
Keep sensitive data out of manifests and version control using external secret stores and encryption at rest.
- Never commit plain Kubernetes Secrets or credentials to git repositories
- Use External Secrets Operator to sync from HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault
- Use Sealed Secrets as an alternative when you need encrypted secrets stored directly in git
- Rotate secrets on a regular schedule and audit access with RBAC and audit logging
- Enable encryption at rest for etcd so secrets on disk are not stored in plaintext
- Mount secrets as files instead of environment variables to reduce exposure in process listings and crash dumps
-
Namespace Organization
Structure cluster tenancy with namespaces per team or environment, enforced with quotas and default-deny policies.
- Use labels like team, environment, and cost-center on every namespace for ownership and chargeback
- Separate production, staging, and development into distinct namespaces with different quota limits
- Apply a default-deny NetworkPolicy in each namespace and explicitly allow required traffic
- Set LimitRange defaults so containers without resource specs get bounded automatically
- Apply ResourceQuota to cap total CPU, memory, and object counts per namespace
- Automate namespace provisioning with a controller or Helm chart so every namespace ships with policies pre-applied