user@argobox:~/journal/2026-01-28-the-drone-that-wasnt-a-drone
$ cat entry.md

The Drone That Wasn't A Drone

○ NOT REVIEWED

The Drone That Wasn’t A Drone

Date: 2026-01-28 Issue: SSH to drone reached wrong machine Root Cause: IP address conflict between LXC container and laptop Lesson: When the OS is wrong, check if you’re on the right machine


The Setup

The build swarm monitor shows drone-Tau-Ceti as OFFLINE. This drone runs in an LXC container on Tau-Ceti-Lab, a dual-boot machine that sometimes accidentally boots into Windows.

The container was assigned IP 10.42.0.174. Should be simple to diagnose.


The Confusion

I SSH into 10.42.0.174 to check on the drone.

ssh [email protected]

Wait. That’s not right.

Welcome to Zorin OS 17.2

Zorin OS? The container runs Gentoo. The host runs Gentoo. Where did Zorin come from?

I run cat /etc/os-release. Zorin. I check hostname. It says sam.

I’m not on my drone. I’m on my wife’s laptop.


The Investigation

I SSH into the actual host (10.42.0.194, the bare-metal machine running the LXC container):

ssh [email protected]
lxc-attach -n drone-Tau-Ceti -- cat /etc/gentoo-release

Output: Gentoo Base System release 2.18

The container is fine. It’s running Gentoo. It’s configured correctly. It’s just… unreachable via its assigned IP.

Check the ARP table:

arp -a | grep 10.42.0.174
? (10.42.0.174) at 5c:e9:31:60:88:90 [ether] on eth0

That MAC address — 5c:e9:31:60:88:90 — that’s sam’s WiFi adapter, not the container’s virtual NIC.

The container has its own MAC:

lxc-attach -n drone-Tau-Ceti -- ip addr show eth0 | grep ether
link/ether a2:62:39:d6:f4:f3

Two completely different devices. Same IP. Classic network collision.


Why The Drone Was Actually Offline

The IP conflict explains why I couldn’t reach it, but why was it offline in the swarm?

Check the drone logs from inside the container:

[2026-01-28 07:35:35] [ERROR] WATCHDOG: Build stuck for =llvm-runtimes/clang-runtime-21.1.8 (3612s). Killing drone to reset.

The drone had been building clang for over an hour when the watchdog killed it. The service crashed and never restarted. This had nothing to do with the IP conflict — that was a separate problem that just made debugging harder.


The Fixes

Fix 1: Restart the drone service

From the host, reach into the container and restart the service:

ssh [email protected] "lxc-attach -n drone-Tau-Ceti -- rc-service swarm-drone restart"

Result:

[2026-01-28 08:20:11] [INFO] Assigned to orchestrator: 100.64.0.18:8080
[2026-01-28 08:20:12] [INFO] Building =sys-apps/smartmontools-7.5-r1...

Back online. Building packages. Good.

Fix 2: Change the container’s IP

The real fix is avoiding the conflict entirely. Changed the container from .174 to .175:

# Inside the container
cat > /etc/conf.d/net << 'EOF'
config_eth0="10.42.0.175/24"
routes_eth0="default via 10.42.0.1"
dns_servers_eth0="10.42.0.1"
EOF

rc-service net.eth0 restart

Now drone-Tau-Ceti is at 10.42.0.175 and sam can keep .174.


What I Learned

  1. When the OS is wrong, you’re on the wrong machine. This seems obvious in retrospect, but in the heat of debugging, I spent 20 minutes trying to figure out why my “Gentoo container” was suddenly running Zorin.

  2. IP conflicts are invisible until you look at MAC addresses. The network just routes to whoever responded first. No errors, no warnings.

  3. Container IPs need to be reserved. DHCP doesn’t know about LXC containers. Either use static IPs for containers or reserve their addresses in your DHCP server.

  4. Always verify you’re on the right machine. A quick hostname or cat /etc/os-release before deep debugging would have saved me significant time.


The Other Problems (Same Session)

This wasn’t the only issue that day. The session also included:

  • Clang build failures blocking 7 packages (fixed with -Wno-error=maybe-uninitialized)
  • Bare-metal drone rebooting the host into Windows (disabled the bare-metal service)
  • DNS broken on drone-Mirach because Tailscale was managing /etc/resolv.conf poorly
  • Missing foundational packages on new drones (glib, cairo, harfbuzz)
  • 81 KDE slot conflicts because drone-Icarus had a full desktop environment installed (it shouldn’t)

Five hours of debugging. Most of it caused by assumptions that turned out to be wrong.


Network Topology (Updated)

After all the fixes:

NodeIPStatus
drone-Icarus10.42.0.203Building
drone-Tau-Ceti10.42.0.175Building
drone-Titawin192.168.20.196Building
drone-Mirach192.168.20.77Bootstrapping
Icarus-Orchestrator10.42.0.201Primary
orch-Titawin192.168.20.118Secondary

Files Changed

FileLocationPurpose
/etc/conf.d/netdrone-Tau-Ceti containerChanged to .175
/etc/portage/package.envAll dronesClang workaround
/etc/init.d/swarm-droneTau-Ceti-Lab hostRenamed to .disabled

Sometimes the bug isn’t in the code. It’s in your assumptions about which machine you’re looking at.