Skip to main content
Admin Modules

Health Monitor & Site Audit

Unified site audit with service health, HTTP smoke testing, system validation, and console error capture

March 1, 2026

Health Monitor & Site Audit

Unified Audit Page

The Site Audit at /admin/audit consolidates all testing into a single page with 4 sections that run in parallel. One click runs everything and produces a single exportable JSON report.

SectionWhat It DoesSource
Service HealthProbes 12 backend services via /api/admin/health-checkAPI fetch
HTTP Smoke TestFetches ~180 URLs across 21 suites, checks status codesClient-side
System ValidationDeep probes via /api/admin/system-test (services + modules + env + deps)API fetch
Console AuditLoads ~155 pages in hidden iframes, captures console errors/warningsIframe capture

Features: Summary row (pass/warn/fail/time), cross-section “Issues Only” filter, collapsible sections, JSON export, clipboard copy, per-item retry buttons, section-level retry, “Retry All Failed” action bar, auto-heal system.

Files: src/pages/admin/audit.astro (~1,380 lines), src/config/modules/health-monitor.ts (manifest).

Legacy individual pages (/admin/health, /admin/site-test, /admin/system-test, /admin/f12-audit) remain functional but are no longer linked from the admin sidebar.

Action Buttons

Every failed check gets inline action buttons:

ButtonSectionAction
RetryHealth, HTTP, ConsoleRe-probes the single item and updates the row in-place
OpenHTTP, ConsoleOpens the URL/page in a new tab for manual inspection
Configure KeysHealthLinks to /admin/api-keys when the failure is due to missing env vars
Full DashboardHealthLinks to /admin/health for the standalone view

Section headers gain a Retry button when issues exist. A fixed action bar at the bottom shows total retryable issues and a “Retry All Failed” button.

Self-Healing (Auto-Heal)

After audit completes, if the Auto-heal checkbox (header, default: on) is enabled and failures exist, the system automatically retries:

  • Up to 3 attempts with 8-second delays between each
  • Retries failed services (via health-check API) and failed HTTP URLs (client-side re-fetch)
  • Skips unconfigured services (those need env vars, not retries)
  • Progress indicator shows attempt number and countdown in the header
  • Stops early if all items recover
  • Final status shows “X healed · Y need attention” or “Completed in Xs”

This pattern is ported from the standalone health dashboard (/admin/health) which uses the same 3-attempt/8s-delay rhythm.

Smart System Test (probeFrom)

The system validation API (/api/admin/system-test) probes each module’s API route. However, some modules proxy to internal services (OpenClaw, Ollama, Argonaut, etc.) that are only reachable from the browser via Tailscale, not from the CF Worker edge.

The probeFrom field on ModuleManifest controls this:

ValueBehavior
'worker' (default)System test probes the API route from the CF Worker
'browser'System test skips the probe, returns “skipped (browser-only)”

Modules with probeFrom: 'browser': openclaw, ollama, argonaut, pentest, servers, workbench, network-discovery, jobs.

This eliminates ~24 false timeout failures that previously appeared when the CF Worker couldn’t reach internal services.

Console Audit Noise Filtering

The console audit filters known-harmless messages:

Noise patterns (suppressed): React DevTools prompts, Lit dev mode, Vite HMR, DevTools source map errors, browser extension URLs, ad-blocker blocks, Cytoscape wheelSensitivity warnings, Network API unavailable.

Trailing slash redirects: Pages that redirect from /path to /path/ are not counted as warnings. Known redirect paths (/dashboard, /dashboard/customize) are fully suppressed.

Cross-origin redirects: Pages that redirect to CF Access login are flagged as warnings (expected behavior for admin pages when not authenticated).

Accessibility Spot-Checks

During the console audit, each iframe also runs basic accessibility checks:

CheckTriggers Warning
<title>Missing or empty document title
img[alt]Images without alt attributes (count reported)
<h1>No h1 element found on the page

These run inside a try/catch — cross-origin pages (CF Access redirects) silently skip a11y checks.


Service Health Dashboard (Detail)

The Health Monitor at /admin/health (also Section 1 of the unified audit) is an enterprise-grade service health dashboard that probes all backend services in parallel and renders rich diagnostic cards. It uses a service registry architecture — all service metadata (icons, env vars, endpoints, diagnostic hints) lives in the API, making the frontend a pure data renderer with zero hardcoded service knowledge.

Architecture

Service Registry (API)

The API at /api/admin/health-check contains a service registry — an array of ServiceDef objects, each defining:

FieldPurpose
idUnique service identifier (e.g., forge-relay)
nameDisplay name
categoryai, infrastructure, or external
iconFontAwesome class (e.g., fa-solid fa-brain)
envVarsRequired environment variables
endpointHealth check URL path
hintsPer-error-code diagnostic messages (keyed by HTTP code, code class, or default)
getCheck()Probe function that returns status + response time

When a service fails, the API returns an enriched diagnostic object:

{
  "httpCode": 401,
  "meaning": "Unauthorized — token invalid or missing",
  "hint": "Relay secret mismatch. Verify FORGE_RELAY_SECRET matches the remote service config.",
  "envVars": ["FORGE_RELAY_URL", "FORGE_RELAY_SECRET"],
  "endpoint": "/health"
}

Single-Service Retry

The API supports GET /api/admin/health-check?service=forge-relay to re-probe a single service without re-checking all 12. This powers the per-card retry buttons and the self-healing system.

Response Shape

{
  "services": [ ... ],
  "meta": {
    "totalMs": 1234,
    "timestamp": "2026-02-28T13:20:16.000Z",
    "serviceCount": 12
  }
}

Frontend Features

Summary Bar

Four stat cards showing counts for Online, Degraded, Offline, and Unconfigured. Cards gain a colored glow border when their count is greater than zero.

Instructions Panel

Collapsible panel (click “How to Use”) with four help cards:

  • Live Monitoring — 15-second auto-refresh
  • Diagnostics — Error codes, causes, and env var guidance
  • Self-Healing — Automatic retry behavior
  • Configuration — How to add/update services via CF Pages env vars

Filter Tabs

Three filter tabs with live counts:

  • All — Every service
  • Issues — Degraded + Offline + Unconfigured
  • Online — Healthy services only

Service Cards

Each service renders as a glassmorphism card with:

  • 40px icon circle — Colored by status (green/amber/red)
  • Service name — Bold white text
  • Response time — Color-coded monospace (green <200ms, amber <500ms, red >500ms)
  • Timestamp — Last checked time
  • Status badge pill — With animated dot indicator
  • Response time bar — Gradient fill showing relative latency (capped at 1000ms)
  • 3px left accent border — Green (online), amber (degraded), red (offline), dashed (unconfigured)
  • Hover lift — Subtle translateY + shadow on mouse hover

Diagnostic Panels

For any non-online service, the card expands to show:

  • Error headline — HTTP code + human-readable meaning (e.g., “HTTP 401 — Unauthorized”)
  • Fix hint — Actionable troubleshooting text from the service registry
  • Env var pills — Purple monospace tags showing which variables to check
  • Endpoint pill — Blue monospace tag showing which URL was probed

Retry Buttons

Each failed service card has a retry button. Clicking it:

  1. Shows a loading spinner on the button
  2. Adds a purple pulse animation to the card
  3. Re-probes only that service via ?service=id
  4. If the service recovers, flashes the card green

The toolbar also has a “Retry All Failed” button that retries every non-online service in parallel.

Self-Healing

On initial load (and after each full refresh), the system automatically schedules retries for failed services:

  • Up to 3 attempts with 8-second delays between each
  • A status indicator in the toolbar shows the countdown and attempt number
  • Only retries degraded and offline services (not unconfigured)
  • Stops early if all services recover
  • Shows “All services recovered!” or “Manual intervention needed” when done

Services Monitored

AI Services

ServiceIconEndpointEnv Vars
OpenClaw Gatewayfa-brain/v1/modelsOPENCLAW_API_TOKEN, OPENCLAW_API_URL
Ollamafa-microchip/api/tagsOLLAMA_API_URL
Argonaut Daemonfa-robot/healthARGONAUT_DAEMON_URL, ARGONAUT_DAEMON_TOKEN

Infrastructure

ServiceIconEndpointEnv Vars
Build Swarm Gatewayfa-cubes/statusGATEWAY_API_URL
Forge Relayfa-tower-broadcast/healthFORGE_RELAY_URL, FORGE_RELAY_SECRET
Pentest Sentinelfa-shield-halved/api/healthPENTEST_VPS_DAEMON_URL, PENTEST_VPS_API_KEY
Meridian-Host AdminBoxfa-hard-drive/api/healthMM_ARGOBOX_URL, MM_ARGOBOX_TOKEN
Tarn-Host AdminBoxfa-server/api/healthTITAN_ADMINBOX_URL, TITAN_ADMINBOX_TOKEN
Uptime Kumafa-chart-line/api/status-page/publicUPTIME_KUMA_URL

External APIs

ServiceIconEndpointEnv Vars
Twiliofa-phone/2010-04-01/AccountsTWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
ElevenLabsfa-volume-high/v1/voicesELEVENLABS_API_KEY
Cloudflarefa-cloud/client/v4/user/tokens/verifyCF_API_TOKEN

CSS Architecture

The page uses <style is:global> because all service cards are generated dynamically via JavaScript innerHTML. Astro’s default scoped styles add data-astro-cid-* attributes to CSS selectors — dynamically created DOM elements don’t receive these attributes, so scoped styles silently fail.

All class names are prefixed to avoid collisions:

  • .hm-* — Page layout (header, summary, toolbar, grid)
  • .svc-* — Service card elements (icon, name, badge, retry)
  • .diag-* — Diagnostic panel elements (error, hint, pills)

Common Degraded States

ServiceTypical ErrorCauseFix
Forge RelayHTTP 401Secret sent as wrong header typeAuth header must be Authorization: Bearer <secret> (not X-Relay-Secret)
Pentest SentinelHTTP 401Wrong auth header or pathUses X-Api-Key header (not Bearer) at /api/health (not /health)
Meridian-HostHTTP 404Wrong health endpoint pathHealth endpoint is /api/health (not /health)
Tarn-Host AdminBoxHTTP 404Wrong health endpoint pathHealth endpoint is /api/health (not /health)
Uptime KumaHTTP 403/404Wrong status page slug or URLSlug is public (not heartbeat), URL should be https://status.Arcturus-Prime.com
ElevenLabsHTTP 401Endpoint requires elevated permissionsUse /v1/voices instead of /v1/user/subscription (doesn’t need user_read scope)

Adding a New Service

  1. Add a new ServiceDef entry in the SERVICES array in src/pages/api/admin/health-check.ts
  2. Define id, name, category, icon, envVars, endpoint, hints, and getCheck()
  3. The frontend renders it automatically — no frontend changes needed
  4. Add the env vars to CF Pages > Settings > Environment Variables

Files

FilePurpose
src/pages/admin/audit.astroUnified audit page — all 4 sections + self-healing + a11y checks
src/pages/admin/health.astroLegacy — standalone service health dashboard
src/pages/admin/site-test.astroLegacy — standalone HTTP smoke test
src/pages/admin/system-test.astroLegacy — standalone system validation
src/pages/admin/f12-audit.astroLegacy — standalone console audit
src/pages/api/admin/health-check.tsAPI — service registry + probes (12 services)
src/pages/api/admin/system-test.tsAPI — deep probes + probeFrom skip logic
src/config/modules/health-monitor.tsModule manifest (nav → /admin/audit)
src/config/module-manifest.tsModuleManifest type — includes probeFrom field
adminhealthmonitoringdiagnosticsaudittesting