Ollama Model Manager

The Ollama page at /admin/ollama manages local LLM models running on the workstation through the Ollama REST API. It provides model lifecycle management (pull, view, delete), VRAM monitoring, popular model suggestions, and graceful offline handling with auto-retry.

Architecture

Admin UI → /api/admin/ollama-manage → Ollama REST API (localhost:11434)
                                       ├── /api/tags (model list)
                                       ├── /api/version (version info)
                                       ├── /api/ps (running models / VRAM)
                                       ├── /api/show (model details)
                                       ├── /api/pull (download with streaming)
                                       └── /api/delete (remove model)

The API route always returns HTTP 200 with a status field (connected or offline). It never returns 502, so the page always renders gracefully regardless of whether Ollama is running.

Page Layout

Stats Row

Five stat cards:

Models — count of installed models
Disk Used — total size of all models
VRAM — current GPU VRAM usage by loaded models (or “idle”)
Version — Ollama server version
Endpoint — configured API URL

Pull Model

Input field for model names (e.g., llama3.2:3b, nomic-embed-text) with real-time SSE streaming progress bar. Supports Enter key to submit.

Running Models

Shows models currently loaded in GPU VRAM with:

Model name
VRAM memory usage
Unload timer (Ollama auto-unloads after ~5 min inactivity)

Empty when no model has been recently queried.

Installed Models

List of all downloaded models with:

Model name — click to view full details
Family badge (purple) — model family (qwen2, llama, nomic-bert, etc.)
Quantization badge (cyan) — quantization level (Q4_K_M, F16, etc.)
Size and parameters — disk size and parameter count
Modified date — relative timestamp
Copy button — copies model name to clipboard
Delete button — removes model with confirmation modal

Popular Models

Grid of 8 common models for one-click pulling:

Model	Description	Size
llama3.2:3b	Fast lightweight chat	2.0 GB
llama3.1:8b	Balanced chat model	4.7 GB
qwen2.5:14b	Strong multilingual	9.0 GB
qwen3:14b	Latest Qwen reasoning	9.3 GB
deepseek-r1:14b	Reasoning-focused	9.0 GB
qwen2.5-coder:14b	Code generation	9.0 GB
nomic-embed-text	Text embeddings for RAG	274 MB
gemma2:9b	Google open model	5.4 GB

Already-installed models are grayed out. Clicking an available model auto-fills the pull input and starts downloading.

Instructions Section

Collapsible panel explaining all features: Pull Models, Running Models, Model Details, Copy & Delete. Notes on model storage location and service configuration.

Troubleshooting Section

Collapsible panel with 5 common issues:

Ollama not responding — rc-service ollama status, restart command, curl test
Pull stuck or failing — disk space check, expected times for large models
“Offline” on Cloudflare Pages — expected behavior, Ollama is local-only
Model loaded but slow — VRAM overflow to system RAM, restart to unload
Wrong endpoint — OLLAMA_API_URL env var configuration

Auto-opens when Ollama is detected as offline.

Offline Handling

When Ollama is unreachable, the page displays:

Amber “offline” status badge (not red “error”)
Centered offline hero with pulsing icon
The configured endpoint URL that failed
Restart command hint: sudo rc-service ollama restart
Troubleshooting section auto-opens
Auto-retry every 15 seconds with countdown timer in header
Automatically transitions to online state when Ollama becomes available

When online, a 60-second health check runs in the background to detect if Ollama goes down.

API Route

GET /api/admin/ollama-manage

Always returns HTTP 200. Fetches /api/tags, /api/version, and /api/ps in parallel using Promise.allSettled().

Response shape:

{
  "status": "connected",
  "url": "http://localhost:11434",
  "modelCount": 7,
  "totalSize": 41830817642,
  "version": "0.16.1",
  "models": [...],
  "runningModels": [...],
  "vramTotal": 6243710752
}

When offline:

{
  "status": "offline",
  "url": "http://localhost:11434",
  "error": "Ollama is not reachable"
}

POST /api/admin/ollama-manage

Actions via { action: "..." } body:

Action	Description	Ollama Endpoint
`list-models`	List installed models	GET /api/tags
`show-model`	Full model details	POST /api/show
`pull-model`	Download with SSE streaming	POST /api/pull
`delete-model`	Remove a model	DELETE /api/delete
`running-models`	Models loaded in VRAM	GET /api/ps
`health`	Connectivity check	GET /api/tags

Configuration

Variable	Default	Description
`OLLAMA_API_URL`	`http://localhost:11434`	Ollama REST API base URL

Model storage: /mnt/AllShare/ollama-models/ (configured in /etc/conf.d/ollama).

Ollama runs as an OpenRC service: /etc/init.d/ollama.

Files

File	Purpose
`src/pages/admin/ollama.astro`	Admin page (HTML, CSS, JS)
`src/pages/api/admin/ollama-manage.ts`	API route (GET + POST)
`src/config/modules/ollama.ts`	Module manifest v2.0.0