user@argobox:~/journal/2026-01-21-the-kernel-that-panicked-every-three-minutes
$ cat entry.md

The Kernel That Panicked Every Three Minutes

○ NOT REVIEWED

The Kernel That Panicked Every Three Minutes

Date: 2026-01-21 Duration: About an hour (felt longer) Issue: Continuous reboot loop Root Cause: K3s pod crashes → kernel panic → auto-reboot → repeat


The Problem

Alpha-Centauri kept rebooting. Every 1-3 minutes. No warning, no pattern, just — reboot.

SSH in, run a command, and before I could finish typing, the connection dropped. System was back up 30 seconds later. Started the investigation again. Dropped again.


The Timeline

18:30 - System boot
18:33 - System reboot (3 min uptime)
18:34 - System boot
18:35 - System reboot (1 min uptime)
18:36 - System boot
18:39 - System reboot (3 min uptime)
... continues for 45 minutes ...

The system couldn’t stay up long enough to debug itself.


The Hunt

Had to work fast. SSH in, run one command, copy the output before the connection died.

First check: what happened before the last reboot?

journalctl -b -1 -n 50

CNI bridge state changes. Hundreds of them.

cni0: port 1(veth...) entered disabled state
cni0: port 1(veth...) entered forwarding state
cni0: port 1(veth...) entered disabled state

The Kubernetes CNI network was thrashing. Interfaces being created and destroyed faster than I could scroll.


The Pods

Checked K3s:

kubectl get pods --all-namespaces
NAME                    READY   STATUS              RESTARTS
openwebui-xxx           0/1     CrashLoopBackOff    47
quartz-vault-xxx        0/1     CrashLoopBackOff    39

Two pods crash-looping. 47 restarts. 39 restarts. They’d been crashing for hours.

Every crash:

  1. Pod dies
  2. Network namespace destroyed
  3. CNI bridge interface removed
  4. Pod restarts
  5. Network namespace created
  6. CNI bridge interface added
  7. Pod crashes again
  8. Repeat

Hundreds of network interface state changes per minute.


The Kernel Panic

The network stack couldn’t handle it. All those rapid interface state changes, the memory churn from pod restarts, the CNI bridge thrashing — something broke deep in the kernel.

Panic.

But I didn’t see the panic. Because of this:

sysctl kernel.panic
# kernel.panic = 10

Ubuntu’s default. When the kernel panics, wait 10 seconds, then automatically reboot.

For production servers with monitoring, this is smart — automatic recovery.

For debugging, this is a nightmare — the system reboots before you can read the panic message.


The Loop

The full sequence:

  1. System boots
  2. K3s starts
  3. Pods start crashing (within 30 seconds)
  4. CNI network thrashes
  5. Kernel panics (1-3 minutes)
  6. Wait 10 seconds
  7. Automatic reboot
  8. Return to step 1

Every. Single. Time.


The Fix

First, break the reboot loop:

sysctl -w kernel.panic=0
echo 'kernel.panic = 0' >> /etc/sysctl.conf

Now the system will halt on panic instead of rebooting. Not ideal for production, but essential for debugging.

Second, stop the chaos:

systemctl stop k3s

No K3s, no pod crashes, no CNI thrashing, no kernel panic.

The system stayed up. First stable boot in an hour.


Was It My Code?

I was running a custom gateway service. First instinct: I broke something.

Searched the entire codebase:

grep -r "reboot\|shutdown.*-r\|systemctl.*reboot" bin/ lib/ scripts/

Zero matches. My code doesn’t reboot anything.

Checked the systemd service:

[Service]
Restart=always

Restart the process on failure. Not reboot the system.

The gateway was a victim, not a perpetrator. It crashed because the kernel underneath it crashed.


The Actual Culprit

Primary cause: K3s pods crash-looping

Contributing factors:

  • 8GB RAM shared between K3s, monitoring, gateway, and other services
  • Aggressive pod restart policy
  • CNI network bridge instability under rapid state changes

Trigger: Something caused the pods to start crashing (OOM? config error? dependency failure?)

Amplifier: kernel.panic=10 turned crashes into an unbreakable loop


The Lessons

Ubuntu’s panic default is dangerous for development. Set kernel.panic=0 on any machine you might need to debug.

K3s on 8GB RAM is risky. The control plane alone wants 1GB. Add pods, and you’re living on the edge. Consider 16GB+ or dedicated K3s nodes.

Crash-looping pods can take down a host. The CNI network changes cascade into kernel-level instability. Resource limits and proper health checks matter.

Check system logs before blaming your code. I spent 15 minutes suspecting my gateway before checking journalctl. The kernel panic was right there in the logs.


The Prevention

Immediate:

  • kernel.panic=0 prevents the reboot loop
  • K3s stopped until pods are fixed

Short-term:

  • Fix or delete the crash-looping pods
  • Add resource limits to K3s workloads
  • Migrate gateway to isolated LXC container

Long-term:

  • Dedicated K3s node(s) with more RAM
  • Proper monitoring with reboot alerts
  • Health checks that prevent infinite crash loops

The kernel wasn’t broken. It was being tortured by Kubernetes. When pods crash 47 times in an hour, something has to give.