The Drone That Wasn’t A Drone
Date: 2026-01-28 Issue: SSH to drone reached wrong machine Root Cause: IP address conflict between LXC container and laptop Lesson: When the OS is wrong, check if you’re on the right machine
The Setup
The build swarm monitor shows drone-Tau-Ceti as OFFLINE. This drone runs in an LXC container on Tau-Ceti-Lab, a dual-boot machine that sometimes accidentally boots into Windows.
The container was assigned IP 10.42.0.174. Should be simple to diagnose.
The Confusion
I SSH into 10.42.0.174 to check on the drone.
ssh [email protected]
Wait. That’s not right.
Welcome to Zorin OS 17.2
Zorin OS? The container runs Gentoo. The host runs Gentoo. Where did Zorin come from?
I run cat /etc/os-release. Zorin. I check hostname. It says sam.
I’m not on my drone. I’m on my wife’s laptop.
The Investigation
I SSH into the actual host (10.42.0.194, the bare-metal machine running the LXC container):
ssh [email protected]
lxc-attach -n drone-Tau-Ceti -- cat /etc/gentoo-release
Output: Gentoo Base System release 2.18
The container is fine. It’s running Gentoo. It’s configured correctly. It’s just… unreachable via its assigned IP.
Check the ARP table:
arp -a | grep 10.42.0.174
? (10.42.0.174) at 5c:e9:31:60:88:90 [ether] on eth0
That MAC address — 5c:e9:31:60:88:90 — that’s sam’s WiFi adapter, not the container’s virtual NIC.
The container has its own MAC:
lxc-attach -n drone-Tau-Ceti -- ip addr show eth0 | grep ether
link/ether a2:62:39:d6:f4:f3
Two completely different devices. Same IP. Classic network collision.
Why The Drone Was Actually Offline
The IP conflict explains why I couldn’t reach it, but why was it offline in the swarm?
Check the drone logs from inside the container:
[2026-01-28 07:35:35] [ERROR] WATCHDOG: Build stuck for =llvm-runtimes/clang-runtime-21.1.8 (3612s). Killing drone to reset.
The drone had been building clang for over an hour when the watchdog killed it. The service crashed and never restarted. This had nothing to do with the IP conflict — that was a separate problem that just made debugging harder.
The Fixes
Fix 1: Restart the drone service
From the host, reach into the container and restart the service:
ssh [email protected] "lxc-attach -n drone-Tau-Ceti -- rc-service swarm-drone restart"
Result:
[2026-01-28 08:20:11] [INFO] Assigned to orchestrator: 100.64.0.18:8080
[2026-01-28 08:20:12] [INFO] Building =sys-apps/smartmontools-7.5-r1...
Back online. Building packages. Good.
Fix 2: Change the container’s IP
The real fix is avoiding the conflict entirely. Changed the container from .174 to .175:
# Inside the container
cat > /etc/conf.d/net << 'EOF'
config_eth0="10.42.0.175/24"
routes_eth0="default via 10.42.0.1"
dns_servers_eth0="10.42.0.1"
EOF
rc-service net.eth0 restart
Now drone-Tau-Ceti is at 10.42.0.175 and sam can keep .174.
What I Learned
-
When the OS is wrong, you’re on the wrong machine. This seems obvious in retrospect, but in the heat of debugging, I spent 20 minutes trying to figure out why my “Gentoo container” was suddenly running Zorin.
-
IP conflicts are invisible until you look at MAC addresses. The network just routes to whoever responded first. No errors, no warnings.
-
Container IPs need to be reserved. DHCP doesn’t know about LXC containers. Either use static IPs for containers or reserve their addresses in your DHCP server.
-
Always verify you’re on the right machine. A quick
hostnameorcat /etc/os-releasebefore deep debugging would have saved me significant time.
The Other Problems (Same Session)
This wasn’t the only issue that day. The session also included:
- Clang build failures blocking 7 packages (fixed with
-Wno-error=maybe-uninitialized) - Bare-metal drone rebooting the host into Windows (disabled the bare-metal service)
- DNS broken on
drone-Mirachbecause Tailscale was managing/etc/resolv.confpoorly - Missing foundational packages on new drones (glib, cairo, harfbuzz)
- 81 KDE slot conflicts because
drone-Icarushad a full desktop environment installed (it shouldn’t)
Five hours of debugging. Most of it caused by assumptions that turned out to be wrong.
Network Topology (Updated)
After all the fixes:
| Node | IP | Status |
|---|---|---|
drone-Icarus | 10.42.0.203 | Building |
drone-Tau-Ceti | 10.42.0.175 | Building |
drone-Titawin | 192.168.20.196 | Building |
drone-Mirach | 192.168.20.77 | Bootstrapping |
Icarus-Orchestrator | 10.42.0.201 | Primary |
orch-Titawin | 192.168.20.118 | Secondary |
Files Changed
| File | Location | Purpose |
|---|---|---|
/etc/conf.d/net | drone-Tau-Ceti container | Changed to .175 |
/etc/portage/package.env | All drones | Clang workaround |
/etc/init.d/swarm-drone | Tau-Ceti-Lab host | Renamed to .disabled |
Sometimes the bug isn’t in the code. It’s in your assumptions about which machine you’re looking at.