Skip to main content
infrastructure

Monitoring & Observability

Monitoring stack across both sites including Glances, Grafana, Uptime Kuma, Netdata, Dozzle, and Tautulli

February 23, 2026

Monitoring & Observability

Monitoring is distributed across both sites with overlapping coverage. There is no single centralized monitoring platform — instead, multiple specialized tools handle different aspects of observability. Glances provides real-time host metrics on every significant machine, Grafana aggregates dashboards, Uptime Kuma tracks service availability, Netdata provides deep per-host telemetry, Dozzle handles Docker log aggregation, and Tautulli monitors Plex specifically.

Monitoring Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│                     MONITORING TOPOLOGY                           │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Milky Way (10.42.0.0/24)                Andromeda (192.168.20.0/24)     │
│  ─────────────────                 ──────────────────────        │
│                                                                  │
│  Altair-Link (10.42.0.199)         Meridian-Host (192.168.20.50)    │
│  ├─ Homepage      :3001            ├─ Grafana       :3001       │
│  ├─ Grafana       :3002            ├─ Uptime Kuma   :3002       │
│  ├─ Uptime Kuma   :3003            ├─ Netdata       :19999      │
│  ├─ Netdata       :19999           ├─ Glances (v4)  :61208      │
│  ├─ Glances (v3)  :61208           ├─ Dozzle        :9999       │
│  ├─ Dozzle        :9999            └─ Tautulli      :8181       │
│  └─ RustDesk      :21115-17                                     │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Glances (System Metrics)

Glances runs on every significant host, providing a web-based overview of CPU, memory, disk I/O, network, and process activity. It is the go-to tool for a quick health check on any individual machine.

Deployment

HostIPPortVersionAccess URL
Altair-Link10.42.0.19961208v3http://10.42.0.199:61208
Meridian-Host192.168.20.5061208v4http://192.168.20.50:61208
Capella-Outpost10.42.0.10061208varieshttp://10.42.0.100:61208
Other hostsvarious61208varieshttp://<host-ip>:61208

Version Differences

  • Glances v3 (Altair-Link): The legacy web UI. Functional but simpler layout. Runs as a Docker container.
  • Glances v4 (Meridian-Host and newer deployments): Redesigned web UI with improved charts and responsiveness. Preferred for new deployments.

Both versions expose the same REST API on port 61208, making them compatible with Grafana data sources regardless of UI version.

Configuration

Glances runs in web server mode by default:

# Docker run example (v4)
docker run -d \
  --name glances \
  --restart unless-stopped \
  --pid host \
  --network host \
  --privileged \
  -e GLANCES_OPT="-w" \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  nicolargo/glances:latest-full

# v3 uses the same approach with an older image tag

The --pid host and --privileged flags give Glances full visibility into the host’s processes and hardware. The Docker socket mount enables container monitoring.

Use Cases

  • Quick health check: Open http://<host>:61208 to see if a machine is under load, out of memory, or experiencing disk pressure.
  • Process hunting: Identify runaway processes consuming CPU or memory.
  • Network throughput: Check real-time bandwidth on each interface.
  • Docker container stats: See per-container CPU and memory when the Docker socket is mounted.

Grafana (Dashboards)

Grafana provides centralized dashboarding and visualization. Two instances run independently on each site.

Instances

HostIPPortAccess URLScope
Altair-Link10.42.0.1993002http://10.42.0.199:3002Milky Way
Meridian-Host192.168.20.503001http://192.168.20.50:3001Andromeda

Data Sources

Grafana pulls metrics from multiple sources:

  • Netdata (via Netdata’s built-in Prometheus exporter or direct API)
  • Glances API (REST endpoints on port 61208)
  • Prometheus (if deployed, scraping exporters)
  • InfluxDB (if deployed, for time-series storage)

Key Dashboards

DashboardHostPurpose
Host OverviewAltair-LinkCPU, memory, disk for Milky Way hosts
Docker ContainersAltair-LinkContainer resource usage
Build Swarm StatusAltair-LinkDrone health, build queue, package counts
Unraid ArrayMeridian-HostDisk utilization, temperatures, parity status
Network ThroughputBothBandwidth across interfaces and Tailscale

Configuration

Grafana runs as a Docker container with persistent volume for dashboards and data source configs:

docker run -d \
  --name grafana \
  --restart unless-stopped \
  -p 3002:3000 \
  -v grafana-data:/var/lib/grafana \
  grafana/grafana-oss:latest

Dashboards are configured through the web UI. There is no Grafana-as-code or provisioning setup currently — dashboards are created and edited manually.

Uptime Kuma (Availability Monitoring)

Uptime Kuma tracks the availability of services across both networks. It pings HTTP endpoints, TCP ports, and ICMP targets at regular intervals and sends notifications when something goes down.

Instances

HostIPPortAccess URLScope
Altair-Link10.42.0.1993003http://10.42.0.199:3003Milky Way + cross-site services
Meridian-Host192.168.20.503002http://192.168.20.50:3002Andromeda services

Monitored Targets

The Altair-Link instance monitors:

  • All Docker services on Altair-Link (HTTP health checks)
  • Proxmox web UIs (Izar-Host, Arcturus-Prime, Tarn-Host via Tailscale)
  • Build swarm gateway (port 8090)
  • Cross-site services on Meridian-Host (via Tailscale)
  • Cloudflare Tunnel health
  • External endpoints (public-facing services)

The Meridian-Host instance monitors:

  • All Docker services on Meridian-Host (HTTP health checks)
  • Plex instances on Polaris-Media (ports 32400, 32401)
  • Synology DSM interfaces (Cassiel-Silo, Mobius-Silo)
  • ASUS gateway reachability

Notifications

Uptime Kuma supports multiple notification channels. Currently configured:

  • Browser notifications (when the Uptime Kuma UI is open)
  • Discord webhook (if configured)
  • Email (if SMTP is configured)

Check each instance’s Settings > Notifications page for active notification channels.

Status Pages

Uptime Kuma can generate public or private status pages showing the health of monitored services. These can be shared with specific people (e.g., a status page for dad showing Plex and NAS health on the Andromeda).

Netdata (Deep Telemetry)

Netdata provides per-second granularity metrics with automatic anomaly detection. It collects hundreds of metrics per host out of the box with zero configuration.

Instances

HostIPPortAccess URL
Altair-Link10.42.0.19919999http://10.42.0.199:19999
Meridian-Host192.168.20.5019999http://192.168.20.50:19999

Capabilities

  • Per-second metrics: CPU, memory, disk I/O, network, interrupts, softnet, and more at 1-second resolution.
  • Automatic application monitoring: Detects running applications (nginx, Docker, systemd services) and creates dashboards automatically.
  • Anomaly detection: Built-in ML-based anomaly detection highlights unusual behavior.
  • Docker monitoring: Per-container metrics when the Docker socket is mounted.
  • Disk health: SMART data monitoring for physical drives.

Netdata vs. Glances

Both provide host metrics, but they serve different purposes:

FeatureNetdataGlances
Resolution1 second~3 seconds
HistoryHours to days (local DB)Real-time only
DepthHundreds of metricsKey metrics overview
SetupAgent-based, auto-discoversSingle container, minimal config
Use caseDeep investigationQuick health check

Use Glances for “is this host okay?” and Netdata for “why is this host slow?”

Dozzle (Docker Logs)

Dozzle provides real-time Docker log viewing through a web UI. It reads from the Docker socket and streams container logs in the browser.

Instances

HostIPPortAccess URL
Altair-Link10.42.0.1999999http://10.42.0.199:9999
Meridian-Host192.168.20.509999http://192.168.20.50:9999

Features

  • Real-time streaming: Logs appear as they are written, similar to docker logs -f.
  • Multi-container view: See logs from all containers simultaneously or filter to a specific one.
  • Search: Full-text search across log output.
  • No agents: Dozzle connects directly to the Docker socket — no sidecar containers or log shipping required.

Configuration

docker run -d \
  --name dozzle \
  --restart unless-stopped \
  -p 9999:8080 \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  amir20/dozzle:latest

Dozzle is read-only — it cannot modify containers or their configurations. The Docker socket is mounted read-only (:ro).

Homepage (Dashboard)

Host: Altair-Link (10.42.0.199) Port: 3001 Access: http://10.42.0.199:3001

Homepage is the central services dashboard. It is not strictly a monitoring tool, but it aggregates status widgets from other monitoring tools and provides quick-access links to every service across both networks.

What It Shows

  • Service status indicators (up/down via HTTP checks)
  • Resource usage widgets (CPU, memory, disk from Glances/Netdata APIs)
  • Docker container status
  • Quick links to all services organized by host and category
  • Weather, bookmarks, and other widget integrations

Homepage is the first page loaded when checking on the infrastructure. If a service is down, the Homepage widget turns red before you even open Uptime Kuma.

Tautulli (Plex Monitoring)

Host: Meridian-Host (192.168.20.50) Port: 8181 Access: http://192.168.20.50:8181 or http://100.64.0.15.30:8181

Tautulli monitors the Plex Media Server instances running on Polaris-Media (192.168.20.201). It connects to the Plex API and tracks:

  • Active streams: Who is watching, what they are watching, stream quality, transcode status.
  • History: Complete playback history with timestamps, users, and content.
  • Library statistics: Media count, recently added, most played.
  • Notifications: Alerts for new content, stream issues, or server health.

Monitored Plex Instances

InstanceHostPortLibrary
Kraken-commanderPolaris-Media (192.168.20.201)32400Primary
Kraken-logistics-officerPolaris-Media (192.168.20.201)32401Secondary

Tautulli provides the answer to “is anyone watching Plex right now?” and “what did people watch this week?” — useful for gauging whether Plex server changes (transcoding settings, library updates) are working correctly.

RustDesk (Remote Access)

Host: Altair-Link (10.42.0.199) Tailscale: 100.64.0.234.88 Ports: 21115, 21116, 21117

RustDesk is a self-hosted remote desktop solution. The relay server runs on Altair-Link, and clients on any machine connect through it for remote desktop sessions.

Port Breakdown

PortProtocolFunction
21115TCPNAT type testing
21116TCP/UDPID registration and hole punching
21117TCPRelay traffic

Client Configuration

RustDesk clients are configured to point at the self-hosted relay:

  • ID Server: 100.64.0.234.88
  • Relay Server: 100.64.0.234.88

This keeps all remote desktop traffic within the Tailscale mesh — no data flows through RustDesk’s public relay servers.

Monitoring Gaps and Known Issues

No Centralized Log Aggregation

Docker logs are visible per-host via Dozzle, but there is no centralized logging solution (ELK, Loki, etc.) that aggregates logs from all hosts into a single searchable store. Non-Docker services (Proxmox, bare-metal drones) have no log forwarding at all.

No Alerting Pipeline

Uptime Kuma provides basic availability alerting, but there is no structured alerting pipeline (PagerDuty, OpsGenie) for critical failures. Notifications go to Discord or email if configured, but there are no escalation policies or on-call rotations (it is a homelab, after all).

No Metrics Retention

Netdata retains metrics for hours to days depending on available disk. Grafana dashboards show real-time data but long-term trend analysis requires a proper time-series database (InfluxDB, Prometheus with retention). This is a known gap.

Host Coverage

Not all hosts run the full monitoring stack. The monitoring tools are deployed primarily on Altair-Link and Meridian-Host. Other hosts (Izar-Host, Tarn-Host, Tau-Host, Capella-Outpost) may have Glances running but do not have Grafana, Uptime Kuma, or Netdata locally. They are monitored remotely by the Altair-Link and Meridian-Host instances.

Future Improvements

  • Deploy Prometheus + Grafana Loki for centralized metrics and log aggregation.
  • Add node_exporter on all Linux hosts for consistent Prometheus scraping.
  • Set up automated alerting for disk space, array health, and Tailscale peer connectivity.
  • Long-term metrics storage with InfluxDB or Prometheus with extended retention.
  • Unified status page for both sites accessible from a single URL.
monitoringgrafanaglancesuptime-kumanetdataobservability