Gateway & Orchestrator
The coordination layer that routes builds, manages queues, and exposes the swarm API
Gateway & Orchestrator
The build swarm has two brains: the Orchestrator that decides what gets built and where, and the Gateway that handles discovery, API access, and public endpoints. They run on separate machines, by design — the orchestrator needs to stay focused on build coordination while the gateway handles the messy world of HTTP routing and external access.
Note: These docs use real hostnames and IPs throughout. The public API sanitizes these via the Galactic Identity System before exposing them externally — see the Public API section for details on what that looks like.
Architecture Overview
Internet / Cloudflare Tunnel
│
▼
┌──────────────────────────┐
│ Gateway (Altair-Link) │
│ 10.42.0.199:8090/8100 │
│ │
│ - Discovery/API │
│ - Public endpoints │
│ - Health checking │
│ - Request routing │
└────────────┬─────────────┘
│
▼
┌──────────────────────────┐
│ Orchestrator (Izar-Host) │
│ 10.42.0.201:8080 │
│ │
│ - Build queue mgmt │
│ - Drone coordination │
│ - State tracking │
│ - Package distribution │
└──────────────────────────┘
The Orchestrator
Location and Access
The orchestrator runs on the Izar-Host Proxmox hypervisor at 10.42.0.201:8080 on the Milky Way. It’s the single source of truth for build state — which packages need building, which drones are working on what, and what’s already done.
What It Does
The orchestrator’s job is coordination, not routing. It doesn’t handle external requests directly; that’s the gateway’s problem. The orchestrator:
- Manages the build queue: Accepts package lists, deduplicates, resolves dependency ordering, and assigns packages to drones based on core count and current load
- Tracks drone state: Maintains a real-time view of every drone’s status (ready, building, degraded, offline) via 30-second heartbeat checks
- Handles failover: When a drone drops out, the orchestrator re-queues its pending packages and redistributes them to remaining drones within 90 seconds
- Produces build artifacts: Tracks which packages have been successfully built and synced to the binhost at 10.42.0.194
- Exposes internal API: Full build state is available at
http://10.42.0.201:8080/api/*for internal consumption
Orchestrator API Endpoints
These are internal-only endpoints, accessible from the Milky Way or via Tailscale:
| Endpoint | Method | Description |
|---|---|---|
/api/status | GET | Orchestrator health and version |
/api/queue | GET | Current build queue with package states |
/api/queue | POST | Submit new packages for building |
/api/drones | GET | All registered drones with status |
/api/drones/{id} | GET | Single drone detail |
/api/builds | GET | Build history (last 100) |
/api/builds/{id} | GET | Single build detail with logs |
/api/packages | GET | All known packages and their binary status |
Queue Management
When a build request arrives (usually from build-swarm submit on Capella-Outpost), the orchestrator:
- Parses the package list from the
emerge -puDN @worldoutput - Deduplicates against packages already built or currently building
- Resolves dependency ordering — if
dev-libs/opensslis needed by 15 other packages, it builds first - Scores each package by estimated compile time (based on historical data)
- Assigns packages to drones: big packages to high-core drones, small packages to whatever’s free
- Tracks progress and handles failures with automatic retry (up to 2 retries per package)
The queue is persistent. If the orchestrator restarts, it recovers queue state and resumes. No builds are lost.
The Gateway
Location and Access
The gateway runs on Altair-Link at 10.42.0.199, serving on two ports:
- Port 8090: Primary API and discovery endpoint
- Port 8100: Secondary/admin interface
Altair-Link is the services host on the Milky Way — it runs Docker containers for the gateway, Command Center, monitoring stack (Prometheus, Loki, Grafana), and other infrastructure services. The gateway is just one container among many on this box.
What It Does
The gateway is the swarm’s public face. It handles:
- Service discovery: Clients (including the Arcturus-Prime website) query the gateway to find the active orchestrator and current swarm state
- Request routing: Incoming build requests are validated and forwarded to the orchestrator
- Health checking: The gateway probes the orchestrator every 30 seconds and maintains its own view of orchestrator availability
- Public API: Exposes sanitized swarm data for external consumption through the Galactic Identity System
- Admin API: Full, unsanitized access for authenticated administrators
- Binhost URL: Returns the current binhost URL so clients know where to pull binary packages
Why Two Services?
The separation is intentional. The orchestrator is stateful and compute-focused — it manages build queues, drone state, and package dependency resolution. Making it also handle HTTP routing, authentication, rate limiting, and public API sanitization would be adding failure modes to the most critical component.
The gateway is stateless and replaceable. If it crashes, nothing is lost — restart it and it immediately starts health-checking the orchestrator again. If the orchestrator crashes, the gateway reports it as down and queues requests until it recovers. Neither component can take the other down with it.
Public API
The gateway exposes sanitized endpoints for public consumption. These are what the Arcturus-Prime website (Arcturus-Prime.com) and Command Center (status.Arcturus-Prime.com) use to display live build data.
Public Endpoints
All public endpoints live under /api/v1/services/public/:
| Endpoint | Description |
|---|---|
/api/v1/services/public/build-swarm | Current swarm status (sanitized) |
/api/v1/services/public/build-history | Recent build history (sanitized) |
/api/v1/services/public/infrastructure | Infrastructure overview (sanitized) |
/api/v1/services/public/queue | Build queue status (sanitized) |
Sanitization means every public API response passes through the Galactic Identity System before being returned to external consumers. Real hostnames become star system names (e.g., drone-Izar-Host becomes drone-Izar, Altair-Link becomes Altair-Link), real IPs become their mapped equivalents (10.42.0.x becomes 10.42.0.x, 192.168.20.x becomes 192.168.20.x), and internal details like MAC addresses and credentials are stripped entirely.
This sanitization happens at the gateway level, not in the orchestrator. The orchestrator always works with real names and IPs internally (as used throughout these docs). The gateway translates on the way out for public consumption only.
Response Format
Public API responses follow a standard envelope (note: these show the sanitized output that external consumers see, not real hostnames):
{
"status": "ok",
"timestamp": "2026-02-23T10:30:00Z",
"data": {
"swarm_version": "2.6.0",
"orchestrator": "orch-Izar",
"drones": [
{
"name": "drone-Izar",
"cores": 16,
"status": "building",
"current_package": "sys-libs/glibc-2.39"
}
],
"queue": {
"pending": 12,
"building": 3,
"completed": 87,
"failed": 0
}
}
}
Note the sanitized names and the absence of real IPs — the public API never leaks internal network topology.
Admin API
For authenticated access with full, unsanitized data, the gateway provides admin endpoints under /api/swarm-admin.
Authentication
Admin endpoints require the X-Admin-Key header with a valid SWARM_ADMIN_KEY. No key, no access. Requests without the header get a 401.
curl -H "X-Admin-Key: ${SWARM_ADMIN_KEY}" \
http://10.42.0.199:8090/api/swarm-admin/status
Admin Endpoints
| Endpoint | Method | Description |
|---|---|---|
/api/swarm-admin/status | GET | Full unsanitized swarm status |
/api/swarm-admin/drones | GET | All drones with real IPs and metrics |
/api/swarm-admin/queue | GET | Full queue with internal details |
/api/swarm-admin/queue | POST | Submit build requests |
/api/swarm-admin/queue/{id} | DELETE | Cancel a queued build |
/api/swarm-admin/drones/{id}/restart | POST | Restart a drone’s agent |
/api/swarm-admin/config | GET | Current orchestrator configuration |
/api/swarm-admin/logs | GET | Recent orchestrator and drone logs |
Admin responses include real hostnames, real IPs, and internal metrics that the public API strips. This is the interface used by the build-swarm CLI tool and internal monitoring.
WebSocket Support
The gateway provides real-time build status via WebSocket connections. This powers the live-updating build progress displays on both Arcturus-Prime.com and status.Arcturus-Prime.com.
Connection
const ws = new WebSocket('wss://swarm.Arcturus-Prime.com/ws/build-status');
ws.onmessage = (event) => {
const update = JSON.parse(event.data);
// update.type: 'drone_status' | 'build_progress' | 'queue_update' | 'build_complete'
// update.data: sanitized payload matching the public API format
};
Event Types
| Event | Fired When |
|---|---|
drone_status | A drone changes state (ready, building, offline) |
build_progress | A package build starts, progresses, or completes |
queue_update | Queue depth changes (new submissions, completions) |
build_complete | An entire build request finishes (all packages done) |
WebSocket connections receive sanitized data only. There’s no admin WebSocket — the admin API uses polling.
API Proxy Configuration
The Arcturus-Prime website (Arcturus-Prime.com, the Astro site) proxies swarm and gateway API requests through its own domain to avoid CORS issues and centralize access. This is configured in astro.config.mjs:
// astro.config.mjs (simplified)
export default defineConfig({
vite: {
server: {
proxy: {
'/api/swarm': {
target: 'https://swarm.Arcturus-Prime.com',
changeOrigin: true,
rewrite: (path) => path.replace(/^\/api\/swarm/, '')
},
'/api/gateway': {
target: 'https://gateway.Arcturus-Prime.com',
changeOrigin: true,
rewrite: (path) => path.replace(/^\/api\/gateway/, '')
}
}
}
}
});
In production (Cloudflare Workers), the proxy routes are handled by the worker’s fetch handler, which forwards requests to the appropriate backend service.
Proxy Routes
| Arcturus-Prime Path | Backend Target |
|---|---|
/api/swarm/* | swarm.Arcturus-Prime.com/* |
/api/gateway/* | gateway.Arcturus-Prime.com/* |
This means the Arcturus-Prime frontend can call /api/swarm/status and get the swarm status without knowing or caring about the actual backend URL. The proxy handles it.
Deployment and Operations
Starting the Services
The orchestrator and gateway are managed as Docker containers on their respective hosts:
# Orchestrator on Izar-Host (10.42.0.201)
docker compose -f /opt/swarm-orchestrator/docker-compose.yml up -d
# Gateway on Altair-Link (10.42.0.199)
docker compose -f /opt/swarm-gateway/docker-compose.yml up -d
Health Checks
The gateway exposes its own health at /health, and it reports the orchestrator’s health based on its 30-second probe cycle:
# Gateway health
curl http://10.42.0.199:8090/health
# Orchestrator health (via gateway)
curl http://10.42.0.199:8090/api/v1/services/public/build-swarm
# Orchestrator health (direct)
curl http://10.42.0.201:8080/api/status
Common Issues
Gateway can’t reach orchestrator: Check that the orchestrator container is running on Izar-Host (10.42.0.201:8080). The gateway probes every 30 seconds and will mark it as down after 3 failures.
Public API returns stale data: The gateway caches public API responses for 10 seconds to reduce load on the orchestrator. Wait a beat and retry.
Admin API returns 401: Check that X-Admin-Key matches the SWARM_ADMIN_KEY environment variable set on the gateway container.
WebSocket disconnects: The gateway has a 60-second ping interval. If the client doesn’t respond to pings, the connection is dropped. Make sure your WebSocket client handles ping/pong frames.