How I Solved Gentoo’s 40-Hour Compile Problem with 4 Machines and 62 Cores
It was 11 PM on New Year’s Eve. I was watching 70 packages compile across 4 machines in my basement. The terminal read:
Building: 68
Complete: 2
Estimated: 6h 42m
By morning, everything would be compiled. No fan noise. No thermal throttling. No babysitting.
I had finally solved the problem that makes most people abandon Gentoo within a week: the compile times.
The Problem Nobody Talks About
Here’s the dirty secret of source-based distributions: they’re miserable to maintain.
The Gentoo philosophy is beautiful in theory. Compile everything from source with custom optimizations. Tailor every package to your hardware. Strip out bloat. Achieve the leanest, fastest system possible.
The reality?
$ time emerge -uDN @world
# ...48 hours later...
I timed a full system update on my desktop once. 48 hours. That’s with an i7-4790K running all 8 threads at 100% for two straight days. My office sounded like a jet engine. The GPU hit 85°C from radiant heat. It sounded like the computer was trying to achieve liftoff.
Now imagine needing to reinstall Firefox because of a security patch. That’s a 2-hour compile every time.
Most guides say “just use binary packages” and move on. But Gentoo’s binary package support is… complicated. The official binhost has packages you probably don’t want (generic builds, different USE flags). And if you want packages optimized for your hardware, you have to compile them yourself.
So I did what any reasonable person would do: I built a distributed compilation cluster in my basement.
The Idea: Make Other Computers Do the Work
The concept was simple:
- Compile once on dedicated machines that don’t mind the heat
- Store the results as binary packages
- Install anywhere in minutes instead of hours
This isn’t new. It’s basically what Arch does with the AUR binary repos. What Red Hat does with Koji. What Debian does with their build farms.
The difference? I was going to do it with whatever hardware I had lying around, connected via Tailscale, running on my ISP’s residential connection.
What could go wrong?
The Cast of Characters
The Drones (The Muscle)
| Name | Hardware | Cores | Role |
|---|---|---|---|
drone-Izar | i7 VM on Proxmox | 16 | Primary builder |
drone-Tarn | Ryzen VM | 14 | Secondary builder |
dr-mm2 | Docker on Unraid NAS | 24 | Heavy lifter |
Tau-Ceti-Lab | Bare-metal desktop | 8 | Backup (Windows dual-boot risk) |
Total: 62 cores available for parallel compilation.
The Orchestrator (The Brain)
A small LXC container (orchestrator-Izar) running Python. Its job:
- Maintain a queue of packages to build
- Assign work to drones based on availability
- Track what’s complete, what’s failed, what’s blocked
- Manage the binary package staging area
The Gateway (The Router)
Another container (10.42.0.199) that:
- Provides node discovery (drones phone home here)
- Routes requests to the active orchestrator
- Handles failover if the primary orchestrator dies
- Serves the binhost URL for package downloads
The Driver (Me)
My desktop (Canopus-Outpost) runs the CLI that talks to this whole mess:
build-swarm status # What's happening?
build-swarm fresh # Start a clean build
build-swarm monitor # Watch it work
The Architecture
Here’s what the data flow looks like:
┌──────────────────────┐
│ MY DESKTOP │
│ (Package Consumer) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ GATEWAY │
│ 10.42.0.199:8090 │
│ │
│ • Node registration │
│ • API routing │
│ • Auto-failover │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ ORCHESTRATOR │
│ 10.42.0.201:8080 │
│ │
│ • Package queue │
│ • Work assignment │
│ • Build tracking │
└──────────┬───────────┘
│
┌───────────┬───────────┼───────────┬───────────┐
▼ ▼ ▼ ▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ Drone 1 │ │ Drone 2 │ │ Drone 3 │ │ Drone 4 │ │
│ 16 core │ │ 14 core │ │ 24 core │ │ 8 core │ │
│ (VM) │ │ (VM) │ │ (Docker)│ │ (Bare) │ │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
└───────────┴───────────┴───────────┘ │
│ │
▼ │
┌──────────────────────┐ │
│ BINHOST │◄──────────┘
│ (nginx) │
└──────────────────────┘
The Build Loop
- I run
build-swarm freshon my desktop - The orchestrator syncs Portage on all nodes
- It calculates which packages need updates
- Packages enter a queue with dependency ordering
- Drones poll for work every 30 seconds
- Each drone:
- Claims a package
- Runs
emerge --buildpkg <package> - Uploads the resulting
.gpkg.tarto staging - Reports success/failure
- When all packages complete, I run
build-swarm finalize - Staging directory → Production binhost (atomic move)
- My desktop runs
emerge --usepkgand gets binaries
The Failures (And There Were Many)
Failure #1: The Race Condition
Day 3. Two drones claimed the same package. Both compiled it. Both uploaded. The second upload overwrote the first with a corrupted file.
The fix: Atomic package claiming with server-side locking:
def claim_package(drone_id, package) -> bool:
with self.lock:
if package in self.claimed:
return False # Already taken
self.claimed[package] = drone_id
return True
Failure #2: The Gateway Died
Week 2. The gateway container ran out of memory. None of the drones could register. The orchestrator had no workers. The entire swarm sat idle for 6 hours while I was at work.
The fix: Heartbeat monitoring. If the gateway stops responding:
- Drones cache their last-known orchestrator URL
- Orchestrator promotes itself to “standalone” mode
- I get an alert via Uptime Kuma
Failure #3: The Portage Sync Drift
Week 3. Drone A synced Portage. Drone B didn’t. Drone A compiled qt-base-6.7.2. Drone B tried to compile something that depended on qt-base-6.7.1 (because that’s what its tree said). Dependency collision.
The fix: The orchestrator now verifies all nodes are synced to the same Portage timestamp before starting builds:
build-swarm sync-verify
# ✓ drone-Izar: 2026-01-25 08:14:22
# ✓ drone-Tarn: 2026-01-25 08:14:22
# ✗ dr-mm2: 2026-01-24 16:30:00 ← STALE
Failure #4: The Bare-Metal Brick
Week 4. Tau-Ceti-Lab is my one bare-metal drone. It dual-boots Windows and Gentoo. The swarm has an “auto-restart on stuck build” feature. It restarted Tau-Ceti-Lab.
It booted into Windows.
The fix: A config flag:
# /etc/build-swarm/drone.conf
AUTO_REBOOT=false # Tau-Ceti-Lab is special
Failure #5: The Disk Full
Month 2. Drones were keeping old build artifacts. /var/cache/binpkgs filled up. Builds started failing with cryptic “disk write error” messages that took hours to diagnose.
The fix: Auto-cleanup after successful uploads. Drones don’t hoard packages anymore.
The Payoff
Here’s what my system update looks like now:
$ build-swarm fresh
═══ GENTOO BUILD SWARM ═══
Starting fresh build...
✓ Cleared staging directory
✓ Reset orchestrator state
✓ Synced portage on all nodes
✓ Discovered 83 needed packages
Build started. Monitor with: build-swarm monitor
Then I go to bed.
$ build-swarm status
═══ BUILD SWARM STATUS ═══
Gateway: ✓ 10.42.0.199
Orchestrator: ✓ 10.42.0.201 (API Online)
Build Progress:
Needed: 0
Building: 0
Complete: 83
Blocked: 0
═══ NEXT ACTION ═══
All builds complete!
→ Run: build-swarm finalize
Then on my desktop:
$ sudo apkg update
Updating from binhost... [83 packages]
>>> Installing www-client/firefox-133.0.3
>>> Installing app-office/libreoffice-24.8.4
...
Completed in 4m 32s
4 minutes. Not 48 hours. Not even 4 hours. Four minutes.
The Monitor
I built a TUI (Terminal User Interface) because I like watching things work:
╔═════════════════════════════════════════════════════════╗
║ 🐝 GENTOO BUILD SWARM v2.6 ║
╠═════════════════════════════════════════════════════════╣
║ ORCH: 10.42.0.201 (primary) GATE: 10.42.0.199 (ok) ║
╠═════════════════════════════════════════════════════════╣
║ DRONE │ CORES │ STATUS ║
╠═════════════════════════════════════════════════════════╣
║ 🟢 drone-1 │ 16 │ Building: dev-libs/openssl ║
║ 🟢 drone-2 │ 14 │ Building: www-client/firefox ║
║ 🟢 drone-3 │ 24 │ Building: kde-plasma/plasma... ║
║ 🟢 drone-4 │ 8 │ Idle ║
╠═════════════════════════════════════════════════════════╣
║ QUEUE: 47 │ DONE: 36 │ BLOCKED: 0 │ ETA: 2h 14m ║
╠═════════════════════════════════════════════════════════╣
║ [q] Quit [b] Balance [u] Unblock [R] Reset ║
╚═════════════════════════════════════════════════════════╝
It’s oddly satisfying to watch package counts climb while doing nothing.
Would I Recommend This?
For your job: Absolutely not. Use containers. Use CI/CD. Use literally anything from this decade.
For your home lab: Maybe.
Here’s my honest assessment:
You Should Consider This If:
- You run Gentoo on 2+ machines
- You have spare hardware (VMs count)
- You find distributed systems problems interesting
- You’re okay with things breaking
You Should NOT Do This If:
- You just want a Linux desktop that works
- You value your free time
- You don’t enjoy debugging at 2 AM
- You’re sane
The Build Swarm took about 60 hours to build and has saved me maybe 200 hours of compile time over 6 months. The ROI is positive, but barely.
The real value was the learning. I now understand:
- How package managers work at a low level
- How distributed task queues operate
- How to handle network partitions and node failures
- Why Kubernetes is actually pretty impressive (it does all this and more)
The Stack
For the curious, here’s what runs the swarm:
| Component | Technology |
|---|---|
| Drones | Python service + OpenRC |
| Orchestrator | Python + Flask API |
| Gateway | Python + Flask |
| Networking | Tailscale mesh VPN |
| Binhost | nginx static file serving |
| Monitoring | Custom TUI + Uptime Kuma |
| Code deploy | Git + SSH |
| Package format | Gentoo .gpkg.tar |
Total lines of Python: ~4,500
Total headaches: Countless
Packages compiled while I slept: 2,847 (and counting)
The Philosophy
Gentoo’s official position is “compile everything yourself.” My position is “I’d rather have the computer compile everything while I sleep.”
The Build Swarm is my compromise. I still get source-based optimization. I still control every USE flag. I still compile from upstream.
I just don’t have to watch.
This post is part of the Argo OS Journey series, documenting the creation of a custom Gentoo-based distribution across my home lab.
Related Posts:
- apkg: The Teaching Package Manager — The wrapper that makes Portage bearable
- Hardening the Build Swarm — Ghost drones, NAT confusion, and fleet security
- Btrfs & Snapper Guide — The snapshot system that saves builds