Skip to main content
Build Swarm

Gateway & Orchestrator

The coordination layer that routes builds, manages queues, and exposes the swarm API

February 22, 2026

Gateway & Orchestrator

The build swarm has two brains: the Orchestrator that decides what gets built and where, and the Gateway that handles discovery, API access, and public endpoints. They run on separate machines, by design — the orchestrator needs to stay focused on build coordination while the gateway handles the messy world of HTTP routing and external access.

Note: These docs use real hostnames and IPs throughout. The public API sanitizes these via the Galactic Identity System before exposing them externally — see the Public API section for details on what that looks like.

Architecture Overview

Internet / Cloudflare Tunnel


┌──────────────────────────┐
│   Gateway (Altair-Link)  │
│   10.42.0.199:8090/8100   │
│                          │
│   - Discovery/API        │
│   - Public endpoints     │
│   - Health checking      │
│   - Request routing      │
└────────────┬─────────────┘


┌──────────────────────────┐
│   Orchestrator (Izar-Host)       │
│   10.42.0.201:8080        │
│                          │
│   - Build queue mgmt     │
│   - Drone coordination   │
│   - State tracking       │
│   - Package distribution │
└──────────────────────────┘

The Orchestrator

Location and Access

The orchestrator runs on the Izar-Host Proxmox hypervisor at 10.42.0.201:8080 on the Milky Way. It’s the single source of truth for build state — which packages need building, which drones are working on what, and what’s already done.

What It Does

The orchestrator’s job is coordination, not routing. It doesn’t handle external requests directly; that’s the gateway’s problem. The orchestrator:

  • Manages the build queue: Accepts package lists, deduplicates, resolves dependency ordering, and assigns packages to drones based on core count and current load
  • Tracks drone state: Maintains a real-time view of every drone’s status (ready, building, degraded, offline) via 30-second heartbeat checks
  • Handles failover: When a drone drops out, the orchestrator re-queues its pending packages and redistributes them to remaining drones within 90 seconds
  • Produces build artifacts: Tracks which packages have been successfully built and synced to the binhost at 10.42.0.194
  • Exposes internal API: Full build state is available at http://10.42.0.201:8080/api/* for internal consumption

Orchestrator API Endpoints

These are internal-only endpoints, accessible from the Milky Way or via Tailscale:

EndpointMethodDescription
/api/statusGETOrchestrator health and version
/api/queueGETCurrent build queue with package states
/api/queuePOSTSubmit new packages for building
/api/dronesGETAll registered drones with status
/api/drones/{id}GETSingle drone detail
/api/buildsGETBuild history (last 100)
/api/builds/{id}GETSingle build detail with logs
/api/packagesGETAll known packages and their binary status

Queue Management

When a build request arrives (usually from build-swarm submit on Capella-Outpost), the orchestrator:

  1. Parses the package list from the emerge -puDN @world output
  2. Deduplicates against packages already built or currently building
  3. Resolves dependency ordering — if dev-libs/openssl is needed by 15 other packages, it builds first
  4. Scores each package by estimated compile time (based on historical data)
  5. Assigns packages to drones: big packages to high-core drones, small packages to whatever’s free
  6. Tracks progress and handles failures with automatic retry (up to 2 retries per package)

The queue is persistent. If the orchestrator restarts, it recovers queue state and resumes. No builds are lost.

The Gateway

Location and Access

The gateway runs on Altair-Link at 10.42.0.199, serving on two ports:

  • Port 8090: Primary API and discovery endpoint
  • Port 8100: Secondary/admin interface

Altair-Link is the services host on the Milky Way — it runs Docker containers for the gateway, Command Center, monitoring stack (Prometheus, Loki, Grafana), and other infrastructure services. The gateway is just one container among many on this box.

What It Does

The gateway is the swarm’s public face. It handles:

  • Service discovery: Clients (including the Arcturus-Prime website) query the gateway to find the active orchestrator and current swarm state
  • Request routing: Incoming build requests are validated and forwarded to the orchestrator
  • Health checking: The gateway probes the orchestrator every 30 seconds and maintains its own view of orchestrator availability
  • Public API: Exposes sanitized swarm data for external consumption through the Galactic Identity System
  • Admin API: Full, unsanitized access for authenticated administrators
  • Binhost URL: Returns the current binhost URL so clients know where to pull binary packages

Why Two Services?

The separation is intentional. The orchestrator is stateful and compute-focused — it manages build queues, drone state, and package dependency resolution. Making it also handle HTTP routing, authentication, rate limiting, and public API sanitization would be adding failure modes to the most critical component.

The gateway is stateless and replaceable. If it crashes, nothing is lost — restart it and it immediately starts health-checking the orchestrator again. If the orchestrator crashes, the gateway reports it as down and queues requests until it recovers. Neither component can take the other down with it.

Public API

The gateway exposes sanitized endpoints for public consumption. These are what the Arcturus-Prime website (Arcturus-Prime.com) and Command Center (status.Arcturus-Prime.com) use to display live build data.

Public Endpoints

All public endpoints live under /api/v1/services/public/:

EndpointDescription
/api/v1/services/public/build-swarmCurrent swarm status (sanitized)
/api/v1/services/public/build-historyRecent build history (sanitized)
/api/v1/services/public/infrastructureInfrastructure overview (sanitized)
/api/v1/services/public/queueBuild queue status (sanitized)

Sanitization means every public API response passes through the Galactic Identity System before being returned to external consumers. Real hostnames become star system names (e.g., drone-Izar-Host becomes drone-Izar, Altair-Link becomes Altair-Link), real IPs become their mapped equivalents (10.42.0.x becomes 10.42.0.x, 192.168.20.x becomes 192.168.20.x), and internal details like MAC addresses and credentials are stripped entirely.

This sanitization happens at the gateway level, not in the orchestrator. The orchestrator always works with real names and IPs internally (as used throughout these docs). The gateway translates on the way out for public consumption only.

Response Format

Public API responses follow a standard envelope (note: these show the sanitized output that external consumers see, not real hostnames):

{
  "status": "ok",
  "timestamp": "2026-02-23T10:30:00Z",
  "data": {
    "swarm_version": "2.6.0",
    "orchestrator": "orch-Izar",
    "drones": [
      {
        "name": "drone-Izar",
        "cores": 16,
        "status": "building",
        "current_package": "sys-libs/glibc-2.39"
      }
    ],
    "queue": {
      "pending": 12,
      "building": 3,
      "completed": 87,
      "failed": 0
    }
  }
}

Note the sanitized names and the absence of real IPs — the public API never leaks internal network topology.

Admin API

For authenticated access with full, unsanitized data, the gateway provides admin endpoints under /api/swarm-admin.

Authentication

Admin endpoints require the X-Admin-Key header with a valid SWARM_ADMIN_KEY. No key, no access. Requests without the header get a 401.

curl -H "X-Admin-Key: ${SWARM_ADMIN_KEY}" \
  http://10.42.0.199:8090/api/swarm-admin/status

Admin Endpoints

EndpointMethodDescription
/api/swarm-admin/statusGETFull unsanitized swarm status
/api/swarm-admin/dronesGETAll drones with real IPs and metrics
/api/swarm-admin/queueGETFull queue with internal details
/api/swarm-admin/queuePOSTSubmit build requests
/api/swarm-admin/queue/{id}DELETECancel a queued build
/api/swarm-admin/drones/{id}/restartPOSTRestart a drone’s agent
/api/swarm-admin/configGETCurrent orchestrator configuration
/api/swarm-admin/logsGETRecent orchestrator and drone logs

Admin responses include real hostnames, real IPs, and internal metrics that the public API strips. This is the interface used by the build-swarm CLI tool and internal monitoring.

WebSocket Support

The gateway provides real-time build status via WebSocket connections. This powers the live-updating build progress displays on both Arcturus-Prime.com and status.Arcturus-Prime.com.

Connection

const ws = new WebSocket('wss://swarm.Arcturus-Prime.com/ws/build-status');

ws.onmessage = (event) => {
  const update = JSON.parse(event.data);
  // update.type: 'drone_status' | 'build_progress' | 'queue_update' | 'build_complete'
  // update.data: sanitized payload matching the public API format
};

Event Types

EventFired When
drone_statusA drone changes state (ready, building, offline)
build_progressA package build starts, progresses, or completes
queue_updateQueue depth changes (new submissions, completions)
build_completeAn entire build request finishes (all packages done)

WebSocket connections receive sanitized data only. There’s no admin WebSocket — the admin API uses polling.

API Proxy Configuration

The Arcturus-Prime website (Arcturus-Prime.com, the Astro site) proxies swarm and gateway API requests through its own domain to avoid CORS issues and centralize access. This is configured in astro.config.mjs:

// astro.config.mjs (simplified)
export default defineConfig({
  vite: {
    server: {
      proxy: {
        '/api/swarm': {
          target: 'https://swarm.Arcturus-Prime.com',
          changeOrigin: true,
          rewrite: (path) => path.replace(/^\/api\/swarm/, '')
        },
        '/api/gateway': {
          target: 'https://gateway.Arcturus-Prime.com',
          changeOrigin: true,
          rewrite: (path) => path.replace(/^\/api\/gateway/, '')
        }
      }
    }
  }
});

In production (Cloudflare Workers), the proxy routes are handled by the worker’s fetch handler, which forwards requests to the appropriate backend service.

Proxy Routes

Arcturus-Prime PathBackend Target
/api/swarm/*swarm.Arcturus-Prime.com/*
/api/gateway/*gateway.Arcturus-Prime.com/*

This means the Arcturus-Prime frontend can call /api/swarm/status and get the swarm status without knowing or caring about the actual backend URL. The proxy handles it.

Deployment and Operations

Starting the Services

The orchestrator and gateway are managed as Docker containers on their respective hosts:

# Orchestrator on Izar-Host (10.42.0.201)
docker compose -f /opt/swarm-orchestrator/docker-compose.yml up -d

# Gateway on Altair-Link (10.42.0.199)
docker compose -f /opt/swarm-gateway/docker-compose.yml up -d

Health Checks

The gateway exposes its own health at /health, and it reports the orchestrator’s health based on its 30-second probe cycle:

# Gateway health
curl http://10.42.0.199:8090/health

# Orchestrator health (via gateway)
curl http://10.42.0.199:8090/api/v1/services/public/build-swarm

# Orchestrator health (direct)
curl http://10.42.0.201:8080/api/status

Common Issues

Gateway can’t reach orchestrator: Check that the orchestrator container is running on Izar-Host (10.42.0.201:8080). The gateway probes every 30 seconds and will mark it as down after 3 failures.

Public API returns stale data: The gateway caches public API responses for 10 seconds to reduce load on the orchestrator. Wait a beat and retry.

Admin API returns 401: Check that X-Admin-Key matches the SWARM_ADMIN_KEY environment variable set on the gateway container.

WebSocket disconnects: The gateway has a 60-second ping interval. If the client doesn’t respond to pings, the connection is dropped. Make sure your WebSocket client handles ping/pong frames.

build-swarmgatewayorchestratorapiinfrastructure