The Drone That Wasnât A Drone
Date: 2026-01-28 Issue: SSH to drone reached wrong machine Root Cause: IP address conflict between LXC container and laptop Lesson: When the OS is wrong, check if youâre on the right machine
The Setup
The build swarm monitor shows drone-Tau-Ceti as OFFLINE. This drone runs in an LXC container on Tau-Ceti-Lab, a dual-boot machine that sometimes accidentally boots into Windows.
The container was assigned IP 10.42.0.174. Should be simple to diagnose.
The Confusion
I SSH into 10.42.0.174 to check on the drone.
ssh [email protected]
Wait. Thatâs not right.
Welcome to Zorin OS 17.2
Zorin OS? The container runs Gentoo. The host runs Gentoo. Where did Zorin come from?
I run cat /etc/os-release. Zorin. I check hostname. It says sam.
Iâm not on my drone. Iâm on my wifeâs laptop.
The Investigation
I SSH into the actual host (10.42.0.194, the bare-metal machine running the LXC container):
ssh [email protected]
lxc-attach -n drone-Tau-Ceti -- cat /etc/gentoo-release
Output: Gentoo Base System release 2.18
The container is fine. Itâs running Gentoo. Itâs configured correctly. Itâs just⌠unreachable via its assigned IP.
Check the ARP table:
arp -a | grep 10.42.0.174
? (10.42.0.174) at 5c:e9:31:60:XX:XX [ether] on eth0
That MAC address â 5c:e9:31:60:XX:XX â thatâs samâs WiFi adapter, not the containerâs virtual NIC.
The container has its own MAC:
lxc-attach -n drone-Tau-Ceti -- ip addr show eth0 | grep ether
link/ether a2:62:39:d6:XX:XX
Two completely different devices. Same IP. Classic network collision.
Why The Drone Was Actually Offline
The IP conflict explains why I couldnât reach it, but why was it offline in the swarm?
Check the drone logs from inside the container:
[2026-01-28 07:35:35] [ERROR] WATCHDOG: Build stuck for =llvm-runtimes/clang-runtime-21.1.8 (3612s). Killing drone to reset.
The drone had been building clang for over an hour when the watchdog killed it. The service crashed and never restarted. This had nothing to do with the IP conflict â that was a separate problem that just made debugging harder.
The Fixes
Fix 1: Restart the drone service
From the host, reach into the container and restart the service:
ssh [email protected] "lxc-attach -n drone-Tau-Ceti -- rc-service swarm-drone restart"
Result:
[2026-01-28 08:20:11] [INFO] Assigned to orchestrator: 100.64.0.18:8080
[2026-01-28 08:20:12] [INFO] Building =sys-apps/smartmontools-7.5-r1...
Back online. Building packages. Good.
Fix 2: Change the containerâs IP
The real fix is avoiding the conflict entirely. Changed the container from .174 to .175:
# Inside the container
cat > /etc/conf.d/net << 'EOF'
config_eth0="10.42.0.175/24"
routes_eth0="default via 10.42.0.1"
dns_servers_eth0="10.42.0.1"
EOF
rc-service net.eth0 restart
Now drone-Tau-Ceti is at 10.42.0.175 and sam can keep .174.
What I Learned
-
When the OS is wrong, youâre on the wrong machine. This seems obvious in retrospect, but in the heat of debugging, I spent 20 minutes trying to figure out why my âGentoo containerâ was suddenly running Zorin.
-
IP conflicts are invisible until you look at MAC addresses. The network just routes to whoever responded first. No errors, no warnings.
-
Container IPs need to be reserved. DHCP doesnât know about LXC containers. Either use static IPs for containers or reserve their addresses in your DHCP server.
-
Always verify youâre on the right machine. A quick
hostnameorcat /etc/os-releasebefore deep debugging would have saved me significant time.
The Other Problems (Same Session)
This wasnât the only issue that day. The session also included:
- Clang build failures blocking 7 packages (fixed with
-Wno-error=maybe-uninitialized) - Bare-metal drone rebooting the host into Windows (disabled the bare-metal service)
- DNS broken on
drone-Meridianbecause Tailscale was managing/etc/resolv.confpoorly - Missing foundational packages on new drones (glib, cairo, harfbuzz)
- 81 KDE slot conflicts because
drone-Izarhad a full desktop environment installed (it shouldnât)
Five hours of debugging. Most of it caused by assumptions that turned out to be wrong.
Network Topology (Updated)
After all the fixes:
| Node | IP | Status |
|---|---|---|
drone-Izar | 10.42.0.203 | Building |
drone-Tau-Ceti | 10.42.0.175 | Building |
drone-Tarn | 192.168.20.196 | Building |
drone-Meridian | 192.168.20.77 | Bootstrapping |
Izar-Orchestrator | 10.42.0.201 | Primary |
orch-Tarn | 192.168.20.118 | Secondary |
Files Changed
| File | Location | Purpose |
|---|---|---|
/etc/conf.d/net | drone-Tau-Ceti container | Changed to .175 |
/etc/portage/package.env | All drones | Clang workaround |
/etc/init.d/swarm-drone | Tau-Ceti-Lab host | Renamed to .disabled |
Sometimes the bug isnât in the code. Itâs in your assumptions about which machine youâre looking at.