Build Swarm
Build Swarm Handbook
Complete reference for operating the Gentoo Build Swarm - daily operations, workflows, and troubleshooting
Gentoo Build Swarm - Complete Handbook
Version: 2.6
This is the complete reference for operating the Gentoo Build Swarm. Start here.
Quick Start
The One Command You Need
build-swarm status
This shows you what’s happening AND tells you what to do next:
═══ BUILD SWARM STATUS ═══
Gateway: ✓ 10.42.0.199
Orchestrator: ✓ 10.42.0.201 (API Online)
Build Progress:
Needed: 0
Building: 2
Complete: 75
Blocked: 1
═══ NEXT ACTION ═══
⚠ 1 package(s) blocked:
• =www-client/brave-browser-1.86.139
→ Run: build-swarm fix-blocked
The Complete Workflow
- Run
build-swarm status - Do whatever it tells you
- Repeat
That’s it.
Daily Operations
Starting a Fresh Build
When you want to update your system:
# Option A: Fresh start (recommended after issues)
build-swarm fresh
# Option B: Continue from previous state
build-swarm release
What fresh does:
- Clears staging directory
- Resets orchestrator state
- Syncs portage trees on all nodes
- Discovers needed packages
- Distributes builds to drones
Monitoring Progress
# Interactive TUI dashboard
build-swarm monitor
# Or just check status
build-swarm status
Monitor keybindings:
q- Quitb- Balance workloadu- Unblock failed packagesR- Reset swarm (careful!)
When Builds Complete
# 1. Verify the build is safe
build-swarm verify
# 2. If Risk Score < 10, release to production
build-swarm finalize
# 3. Update your desktop
sudo apkg update
Handling Blocked Packages
# Diagnose the issue
build-swarm fix-blocked
# Common fixes:
build-swarm sync-overlays # Missing overlay
build-swarm build-local <pkg> # Kernel-specific (nvidia-drivers)
build-swarm unblock # Retry transient failures
build-swarm sync-fix # Out-of-sync portage trees
Build Workflows
Standard Update Workflow
┌─────────────────────────────────────────────────────────────┐
│ YOUR TYPICAL DAY │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. build-swarm fresh # Start a clean build │
│ │ │
│ ▼ │
│ 2. build-swarm monitor # Watch progress (optional) │
│ │ │
│ ▼ │
│ 3. build-swarm status # Check when complete │
│ │ │
│ ▼ │
│ 4. build-swarm verify # Safety check │
│ │ │
│ ▼ │
│ 5. build-swarm finalize # Release to production │
│ │ │
│ ▼ │
│ 6. sudo apkg update # Update your desktop │
│ │
└─────────────────────────────────────────────────────────────┘
Quick Reference Table
| I want to… | Run this |
|---|---|
| Check what’s happening | build-swarm status |
| Start a fresh build | build-swarm fresh |
| Watch builds live | build-swarm monitor |
| See blocked packages | build-swarm fix-blocked |
| Retry failed packages | build-swarm unblock |
| Check if release is safe | build-swarm verify |
| Release to production | build-swarm finalize |
| Update my desktop | sudo apkg update |
| Update a drone | build-swarm update-drone <name> |
Staging vs Production
The swarm uses a two-tier storage model for safety:
STAGING (/var/cache/binpkgs-staging/)
↓ Drones upload here
↓ NOT served to clients
↓ Work in progress
──── build-swarm finalize ────
PRODUCTION (/var/cache/binpkgs/)
↓ Nginx serves from here
↓ Atomic release (all or nothing)
↓ What clients actually see
Key point: Packages in staging are invisible to your desktop until you run finalize.
Command Reference
Status & Monitoring
| Command | Description |
|---|---|
build-swarm status | Show current status and next action |
build-swarm monitor | Launch interactive TUI dashboard |
build-swarm info | Show connection info (Gateway IP, Orchestrator IP) |
build-swarm logs <drone> | Stream live build logs from a drone |
Build Operations
| Command | Description |
|---|---|
build-swarm release | Full pipeline: sync → build → verify → finalize |
build-swarm fresh | Reset everything and start clean build |
build-swarm verify | Run safety checks (Risk Score) |
build-swarm finalize | Move staging → production (release packages) |
Troubleshooting
| Command | Description |
|---|---|
build-swarm fix-blocked | Diagnose blocked packages and suggest fixes |
build-swarm unblock | Retry all blocked packages |
build-swarm sync-overlays | Sync custom overlays to all drones |
build-swarm sync-verify | Check if all drones have synced portage |
build-swarm sync-fix | Auto-fix out-of-sync drones |
build-swarm sync-all | Full portage sync (upstream + all nodes) |
build-swarm build-local <pkg> | Build package locally and upload |
Infrastructure Management
| Command | Description |
|---|---|
build-swarm push | Git pull + deploy code to all nodes |
build-swarm push drones | Deploy only to drones |
build-swarm push <name> | Deploy to specific node |
build-swarm rename <ip|name> <new> | Rename a node |
build-swarm test | Run full system integration tests |
build-swarm stress [N] | Stress test with N dummy packages |
Code Deployment
# The standard workflow for updating swarm code:
cd ~/Development/gentoo-build-swarm
git add -A && git commit -m "Your changes" && git push # Save to Gitea
build-swarm push # Deploy to all nodes
Troubleshooting
Package Build Failed
# 1. See what's blocked
build-swarm fix-blocked
# 2. Check drone logs for details
build-swarm logs drone-Izar-Host
# 3. Fix based on error type:
| Error Type | Fix |
|---|---|
| Missing overlay | build-swarm sync-overlays |
| Portage tree mismatch | build-swarm sync-fix |
| Kernel-specific (nvidia) | build-swarm build-local nvidia-drivers |
| Transient failure | build-swarm unblock |
| USE flag mismatch | Update /etc/portage/ and build-swarm fresh |
Drone Offline
# Check if drone can reach gateway
ssh root@<drone-ip> 'curl -s http://10.42.0.199:8090/health'
# Check service status
ssh root@<drone-ip> 'rc-service swarm-drone status'
# Check logs
ssh root@<drone-ip> 'tail -50 /var/log/build-swarm/drone.log'
# Restart if needed
ssh root@<drone-ip> 'rc-service swarm-drone restart'
Orchestrator Unreachable
# Check orchestrator status
build-swarm status
# If primary down, gateway auto-routes to backup
# Check which orchestrator is active:
curl -s http://10.42.0.199:8090/api/v1/orchestrator
Packages Stuck in Staging
# Check staging count
build-swarm status
# If builds complete but packages not visible:
build-swarm verify # Check if safe
build-swarm finalize # Move to production
Configuration
Drone Configuration
File: /etc/build-swarm/drone.conf
# Required
GATEWAY_URL="http://10.42.0.199:8090"
# Optional
NODE_NAME="my-drone" # Display name (default: hostname)
REPORT_IP="100.x.x.x" # Override reported IP (Tailscale)
UPLOAD_HOST="100.x.x.x" # Override upload destination
HEARTBEAT_INTERVAL=30 # Seconds between gateway heartbeats
POLL_INTERVAL=30 # Seconds between work polling
AUTO_REBOOT=true # Kill on stuck builds (1hr timeout)
Portage Configuration (Drones)
File: /etc/portage/make.conf
# Set to core count
MAKEOPTS="-j16"
# Required features
FEATURES="buildpkg fail-clean -getbinpkg -binpkg-multi-instance"
Orchestrator Configuration
File: /etc/build-swarm/orchestrator.conf
GATEWAY_URL="http://10.42.0.199:8090"
ORCHESTRATOR_PORT=8080
BUILD_MODE="delegate_first" # delegate_only, delegate_first, hybrid
Maintenance
Weekly Tasks
# Check for blocked packages
build-swarm status
# Verify all drones are synced
build-swarm sync-verify
# Check drone disk space
for drone in drone-Izar-Host drone-Tarn; do
ssh root@$drone 'df -h /var/cache/binpkgs'
done
Updating Swarm Code
cd ~/Development/gentoo-build-swarm
git pull # Get latest
build-swarm push # Deploy to all nodes
Adding a New Drone
# Method A: Remote installation
build-swarm add drone Worker-01 --ip 10.42.0.50
# Method B: Local installation (on the drone)
git clone https://github.com/Arcturus-Prime/gentoo-build-swarm.git
cd gentoo-build-swarm
sudo ./install.sh drone 10.42.0.199
Renaming Nodes
build-swarm rename 10.42.0.184 drone-Tau-Host
build-swarm rename drone-old drone-new
Quick Reference Card
┌────────────────────────────────────────────────────────────┐
│ GENTOO BUILD SWARM - QUICK REFERENCE │
├────────────────────────────────────────────────────────────┤
│ │
│ CHECK STATUS: build-swarm status │
│ START BUILD: build-swarm fresh │
│ WATCH PROGRESS: build-swarm monitor │
│ FIX PROBLEMS: build-swarm fix-blocked │
│ VERIFY SAFE: build-swarm verify │
│ RELEASE: build-swarm finalize │
│ UPDATE DESKTOP: sudo apkg update │
│ │
│ DEPLOY CODE: build-swarm push │
│ VIEW LOGS: build-swarm logs <drone> │
│ SYNC PORTAGE: build-swarm sync-all │
│ │
├────────────────────────────────────────────────────────────┤
│ Gateway: 10.42.0.199:8090 │
│ Orchestrator: 10.42.0.201:8080 │
│ Binhost: http://10.42.0.201/packages │
└────────────────────────────────────────────────────────────┘