The Calm Before the Storm

The remote network (192.168.20.x) is usually quiet. It houses the heavy lifters: storage servers. The Unraid box is the flagship — terabytes of data, critical Docker containers.

This week, things went sideways. Twice.

Incident 1: The Log Flood (Jan 20)

It started with an alert. The Unraid server’s log partition was at 94% capacity.

For Unraid, where /var/log lives in RAM, this is deadly. Hit 100% and services crash, the GUI dies, and you’re looking at a hard reboot.

The Detective Work

I SSH’d in through the Proxmox jump box and ran the usual suspects.

du -sh /var/log/* showed… nothing huge?

That’s when I remembered ghost files.

lsof +L1 /var/log

There it was. rsyslogd was holding a deleted file handle. It had grown to 112MB (massive for a RAM disk).

But why?

I dug into the active logs and found a scream of errors:

emhttpd: error: getxattr on /mnt/user/synology_mount: Operation not supported

The Synology NAS units are mounted via rclone into the main /mnt/user tree. Unraid’s management daemon (emhttpd) constantly scans /mnt/user. It tries to read extended attributes (xattrs). Rclone mounts don’t support xattrs.

Result: 172,800 errors per day.

The Fix

First, stop the bleeding.

kill -HUP $(cat /var/run/rsyslogd.pid)

This forced rsyslog to release the handle. Usage dropped to 6%.

Second, apply the bandaid. I couldn’t move the mount points without breaking containers, so I told rsyslog to ignore the noise.

Added to /boot/config/rsyslog.conf:

if $programname == "emhttpd" and $msg contains "synology" then stop

Silence is golden.

Incident 2: The Network Collapse (Jan 22)

Two days later, I got ambitious. I wanted faster transfer speeds for the Unraid box. I had a spare Thunderbolt 10GbE adapter. I plugged it in.

IMMEDIATELY, the server went dark.

No Web UI
No SSH
No Docker services
Ping worked… sometimes?

I unplugged the adapter. It didn’t come back.

The “Safe Mode” Scramble

I had to boot into Safe Mode (no plugins, no Docker) to get access. Something had corrupted the network stack.

It turned out to be a perfect storm of three separate failures:

1. The Boot Script

My /boot/config/go file (Unraid’s startup script) had syntax errors. Missing then keywords. It had been failing silently for who knows how long.

2. Docker Config

I had told Docker to use wlan0 (WiFi) as a custom network at some point. The WiFi interface was down. Docker refused to start, dragging the system down with it.

3. The Tailscale Routing Trap

This was the kicker.

The Proxmox hypervisor acts as a subnet router, advertising 192.168.20.0/24 to the Tailscale network.

When the Unraid server connected to Tailscale, it learned the route to its own subnet via Tailscale.

So when I tried to SSH from the hypervisor (192.168.20.100) to Unraid (192.168.20.50):

Packet goes Hypervisor → Unraid (Local LAN)
Unraid replies… but thinks the fastest path to 192.168.20.0/24 is via tailscale0!
Packet goes Unraid → Tailscale relay → Hypervisor
Hypervisor drops it because of asymmetric routing / state flags

The Resolution

We had to teach Unraid to ignore its own tail.

Added a priority rule to the routing table:

ip rule add to 192.168.20.0/24 lookup main priority 5200

This forces traffic destined for the local LAN to use the main routing table (eth0/br0), ignoring Tailscale’s magic routes.

Lessons Learned

Ghost logs are real. Always check lsof +L1 when disk space vanishes but du shows nothing.

Verify your boot scripts. A script with a syntax error might not run anything after the error. No warnings. Just silent failure.

Thunderbolt on Unraid is cursed. Without official support, it’s a gamble. Sticking to USB 2.5GbE adapters for now.

Routing is hard. When you have a subnet router inside the subnet it routes, asymmetric routing will bite you.

The server is stable. For now.