Health Monitor & Site Audit
Unified site audit with service health, HTTP smoke testing, system validation, and console error capture
Health Monitor & Site Audit
Unified Audit Page
The Site Audit at /admin/audit consolidates all testing into a single page with 4 sections that run in parallel. One click runs everything and produces a single exportable JSON report.
| Section | What It Does | Source |
|---|---|---|
| Service Health | Probes 12 backend services via /api/admin/health-check | API fetch |
| HTTP Smoke Test | Fetches ~180 URLs across 21 suites, checks status codes | Client-side |
| System Validation | Deep probes via /api/admin/system-test (services + modules + env + deps) | API fetch |
| Console Audit | Loads ~155 pages in hidden iframes, captures console errors/warnings | Iframe capture |
Features: Summary row (pass/warn/fail/time), cross-section “Issues Only” filter, collapsible sections, JSON export, clipboard copy, per-item retry buttons, section-level retry, “Retry All Failed” action bar, auto-heal system.
Files: src/pages/admin/audit.astro (~1,380 lines), src/config/modules/health-monitor.ts (manifest).
Legacy individual pages (/admin/health, /admin/site-test, /admin/system-test, /admin/f12-audit) remain functional but are no longer linked from the admin sidebar.
Action Buttons
Every failed check gets inline action buttons:
| Button | Section | Action |
|---|---|---|
| Retry | Health, HTTP, Console | Re-probes the single item and updates the row in-place |
| Open | HTTP, Console | Opens the URL/page in a new tab for manual inspection |
| Configure Keys | Health | Links to /admin/api-keys when the failure is due to missing env vars |
| Full Dashboard | Health | Links to /admin/health for the standalone view |
Section headers gain a Retry button when issues exist. A fixed action bar at the bottom shows total retryable issues and a “Retry All Failed” button.
Self-Healing (Auto-Heal)
After audit completes, if the Auto-heal checkbox (header, default: on) is enabled and failures exist, the system automatically retries:
- Up to 3 attempts with 8-second delays between each
- Retries failed services (via health-check API) and failed HTTP URLs (client-side re-fetch)
- Skips
unconfiguredservices (those need env vars, not retries) - Progress indicator shows attempt number and countdown in the header
- Stops early if all items recover
- Final status shows “X healed · Y need attention” or “Completed in Xs”
This pattern is ported from the standalone health dashboard (/admin/health) which uses the same 3-attempt/8s-delay rhythm.
Smart System Test (probeFrom)
The system validation API (/api/admin/system-test) probes each module’s API route. However, some modules proxy to internal services (OpenClaw, Ollama, Argonaut, etc.) that are only reachable from the browser via Tailscale, not from the CF Worker edge.
The probeFrom field on ModuleManifest controls this:
| Value | Behavior |
|---|---|
'worker' (default) | System test probes the API route from the CF Worker |
'browser' | System test skips the probe, returns “skipped (browser-only)” |
Modules with probeFrom: 'browser': openclaw, ollama, argonaut, pentest, servers, workbench, network-discovery, jobs.
This eliminates ~24 false timeout failures that previously appeared when the CF Worker couldn’t reach internal services.
Console Audit Noise Filtering
The console audit filters known-harmless messages:
Noise patterns (suppressed): React DevTools prompts, Lit dev mode, Vite HMR, DevTools source map errors, browser extension URLs, ad-blocker blocks, Cytoscape wheelSensitivity warnings, Network API unavailable.
Trailing slash redirects: Pages that redirect from /path to /path/ are not counted as warnings. Known redirect paths (/dashboard, /dashboard/customize) are fully suppressed.
Cross-origin redirects: Pages that redirect to CF Access login are flagged as warnings (expected behavior for admin pages when not authenticated).
Accessibility Spot-Checks
During the console audit, each iframe also runs basic accessibility checks:
| Check | Triggers Warning |
|---|---|
<title> | Missing or empty document title |
img[alt] | Images without alt attributes (count reported) |
<h1> | No h1 element found on the page |
These run inside a try/catch — cross-origin pages (CF Access redirects) silently skip a11y checks.
Service Health Dashboard (Detail)
The Health Monitor at /admin/health (also Section 1 of the unified audit) is an enterprise-grade service health dashboard that probes all backend services in parallel and renders rich diagnostic cards. It uses a service registry architecture — all service metadata (icons, env vars, endpoints, diagnostic hints) lives in the API, making the frontend a pure data renderer with zero hardcoded service knowledge.
Architecture
Service Registry (API)
The API at /api/admin/health-check contains a service registry — an array of ServiceDef objects, each defining:
| Field | Purpose |
|---|---|
id | Unique service identifier (e.g., forge-relay) |
name | Display name |
category | ai, infrastructure, or external |
icon | FontAwesome class (e.g., fa-solid fa-brain) |
envVars | Required environment variables |
endpoint | Health check URL path |
hints | Per-error-code diagnostic messages (keyed by HTTP code, code class, or default) |
getCheck() | Probe function that returns status + response time |
When a service fails, the API returns an enriched diagnostic object:
{
"httpCode": 401,
"meaning": "Unauthorized — token invalid or missing",
"hint": "Relay secret mismatch. Verify FORGE_RELAY_SECRET matches the remote service config.",
"envVars": ["FORGE_RELAY_URL", "FORGE_RELAY_SECRET"],
"endpoint": "/health"
}
Single-Service Retry
The API supports GET /api/admin/health-check?service=forge-relay to re-probe a single service without re-checking all 12. This powers the per-card retry buttons and the self-healing system.
Response Shape
{
"services": [ ... ],
"meta": {
"totalMs": 1234,
"timestamp": "2026-02-28T13:20:16.000Z",
"serviceCount": 12
}
}
Frontend Features
Summary Bar
Four stat cards showing counts for Online, Degraded, Offline, and Unconfigured. Cards gain a colored glow border when their count is greater than zero.
Instructions Panel
Collapsible panel (click “How to Use”) with four help cards:
- Live Monitoring — 15-second auto-refresh
- Diagnostics — Error codes, causes, and env var guidance
- Self-Healing — Automatic retry behavior
- Configuration — How to add/update services via CF Pages env vars
Filter Tabs
Three filter tabs with live counts:
- All — Every service
- Issues — Degraded + Offline + Unconfigured
- Online — Healthy services only
Service Cards
Each service renders as a glassmorphism card with:
- 40px icon circle — Colored by status (green/amber/red)
- Service name — Bold white text
- Response time — Color-coded monospace (green <200ms, amber <500ms, red >500ms)
- Timestamp — Last checked time
- Status badge pill — With animated dot indicator
- Response time bar — Gradient fill showing relative latency (capped at 1000ms)
- 3px left accent border — Green (online), amber (degraded), red (offline), dashed (unconfigured)
- Hover lift — Subtle translateY + shadow on mouse hover
Diagnostic Panels
For any non-online service, the card expands to show:
- Error headline — HTTP code + human-readable meaning (e.g., “HTTP 401 — Unauthorized”)
- Fix hint — Actionable troubleshooting text from the service registry
- Env var pills — Purple monospace tags showing which variables to check
- Endpoint pill — Blue monospace tag showing which URL was probed
Retry Buttons
Each failed service card has a retry button. Clicking it:
- Shows a loading spinner on the button
- Adds a purple pulse animation to the card
- Re-probes only that service via
?service=id - If the service recovers, flashes the card green
The toolbar also has a “Retry All Failed” button that retries every non-online service in parallel.
Self-Healing
On initial load (and after each full refresh), the system automatically schedules retries for failed services:
- Up to 3 attempts with 8-second delays between each
- A status indicator in the toolbar shows the countdown and attempt number
- Only retries
degradedandofflineservices (notunconfigured) - Stops early if all services recover
- Shows “All services recovered!” or “Manual intervention needed” when done
Services Monitored
AI Services
| Service | Icon | Endpoint | Env Vars |
|---|---|---|---|
| OpenClaw Gateway | fa-brain | /v1/models | OPENCLAW_API_TOKEN, OPENCLAW_API_URL |
| Ollama | fa-microchip | /api/tags | OLLAMA_API_URL |
| Argonaut Daemon | fa-robot | /health | ARGONAUT_DAEMON_URL, ARGONAUT_DAEMON_TOKEN |
Infrastructure
| Service | Icon | Endpoint | Env Vars |
|---|---|---|---|
| Build Swarm Gateway | fa-cubes | /status | GATEWAY_API_URL |
| Forge Relay | fa-tower-broadcast | /health | FORGE_RELAY_URL, FORGE_RELAY_SECRET |
| Pentest Sentinel | fa-shield-halved | /api/health | PENTEST_VPS_DAEMON_URL, PENTEST_VPS_API_KEY |
| Meridian-Host AdminBox | fa-hard-drive | /api/health | MM_ARGOBOX_URL, MM_ARGOBOX_TOKEN |
| Tarn-Host AdminBox | fa-server | /api/health | TITAN_ADMINBOX_URL, TITAN_ADMINBOX_TOKEN |
| Uptime Kuma | fa-chart-line | /api/status-page/public | UPTIME_KUMA_URL |
External APIs
| Service | Icon | Endpoint | Env Vars |
|---|---|---|---|
| Twilio | fa-phone | /2010-04-01/Accounts | TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN |
| ElevenLabs | fa-volume-high | /v1/voices | ELEVENLABS_API_KEY |
| Cloudflare | fa-cloud | /client/v4/user/tokens/verify | CF_API_TOKEN |
CSS Architecture
The page uses <style is:global> because all service cards are generated dynamically via JavaScript innerHTML. Astro’s default scoped styles add data-astro-cid-* attributes to CSS selectors — dynamically created DOM elements don’t receive these attributes, so scoped styles silently fail.
All class names are prefixed to avoid collisions:
.hm-*— Page layout (header, summary, toolbar, grid).svc-*— Service card elements (icon, name, badge, retry).diag-*— Diagnostic panel elements (error, hint, pills)
Common Degraded States
| Service | Typical Error | Cause | Fix |
|---|---|---|---|
| Forge Relay | HTTP 401 | Secret sent as wrong header type | Auth header must be Authorization: Bearer <secret> (not X-Relay-Secret) |
| Pentest Sentinel | HTTP 401 | Wrong auth header or path | Uses X-Api-Key header (not Bearer) at /api/health (not /health) |
| Meridian-Host | HTTP 404 | Wrong health endpoint path | Health endpoint is /api/health (not /health) |
| Tarn-Host AdminBox | HTTP 404 | Wrong health endpoint path | Health endpoint is /api/health (not /health) |
| Uptime Kuma | HTTP 403/404 | Wrong status page slug or URL | Slug is public (not heartbeat), URL should be https://status.Arcturus-Prime.com |
| ElevenLabs | HTTP 401 | Endpoint requires elevated permissions | Use /v1/voices instead of /v1/user/subscription (doesn’t need user_read scope) |
Adding a New Service
- Add a new
ServiceDefentry in theSERVICESarray insrc/pages/api/admin/health-check.ts - Define
id,name,category,icon,envVars,endpoint,hints, andgetCheck() - The frontend renders it automatically — no frontend changes needed
- Add the env vars to CF Pages > Settings > Environment Variables
Files
| File | Purpose |
|---|---|
src/pages/admin/audit.astro | Unified audit page — all 4 sections + self-healing + a11y checks |
src/pages/admin/health.astro | Legacy — standalone service health dashboard |
src/pages/admin/site-test.astro | Legacy — standalone HTTP smoke test |
src/pages/admin/system-test.astro | Legacy — standalone system validation |
src/pages/admin/f12-audit.astro | Legacy — standalone console audit |
src/pages/api/admin/health-check.ts | API — service registry + probes (12 services) |
src/pages/api/admin/system-test.ts | API — deep probes + probeFrom skip logic |
src/config/modules/health-monitor.ts | Module manifest (nav → /admin/audit) |
src/config/module-manifest.ts | ModuleManifest type — includes probeFrom field |