Ollama Model Manager
Local LLM model lifecycle management with graceful offline handling, VRAM monitoring, and built-in troubleshooting
Ollama Model Manager
The Ollama page at /admin/ollama manages local LLM models running on the workstation through the Ollama REST API. It provides model lifecycle management (pull, view, delete), VRAM monitoring, popular model suggestions, and graceful offline handling with auto-retry.
Architecture
Admin UI → /api/admin/ollama-manage → Ollama REST API (localhost:11434)
├── /api/tags (model list)
├── /api/version (version info)
├── /api/ps (running models / VRAM)
├── /api/show (model details)
├── /api/pull (download with streaming)
└── /api/delete (remove model)
The API route always returns HTTP 200 with a status field (connected or offline). It never returns 502, so the page always renders gracefully regardless of whether Ollama is running.
Page Layout
Stats Row
Five stat cards:
- Models — count of installed models
- Disk Used — total size of all models
- VRAM — current GPU VRAM usage by loaded models (or “idle”)
- Version — Ollama server version
- Endpoint — configured API URL
Pull Model
Input field for model names (e.g., llama3.2:3b, nomic-embed-text) with real-time SSE streaming progress bar. Supports Enter key to submit.
Running Models
Shows models currently loaded in GPU VRAM with:
- Model name
- VRAM memory usage
- Unload timer (Ollama auto-unloads after ~5 min inactivity)
Empty when no model has been recently queried.
Installed Models
List of all downloaded models with:
- Model name — click to view full details
- Family badge (purple) — model family (qwen2, llama, nomic-bert, etc.)
- Quantization badge (cyan) — quantization level (Q4_K_M, F16, etc.)
- Size and parameters — disk size and parameter count
- Modified date — relative timestamp
- Copy button — copies model name to clipboard
- Delete button — removes model with confirmation modal
Popular Models
Grid of 8 common models for one-click pulling:
| Model | Description | Size |
|---|---|---|
| llama3.2:3b | Fast lightweight chat | 2.0 GB |
| llama3.1:8b | Balanced chat model | 4.7 GB |
| qwen2.5:14b | Strong multilingual | 9.0 GB |
| qwen3:14b | Latest Qwen reasoning | 9.3 GB |
| deepseek-r1:14b | Reasoning-focused | 9.0 GB |
| qwen2.5-coder:14b | Code generation | 9.0 GB |
| nomic-embed-text | Text embeddings for RAG | 274 MB |
| gemma2:9b | Google open model | 5.4 GB |
Already-installed models are grayed out. Clicking an available model auto-fills the pull input and starts downloading.
Instructions Section
Collapsible panel explaining all features: Pull Models, Running Models, Model Details, Copy & Delete. Notes on model storage location and service configuration.
Troubleshooting Section
Collapsible panel with 5 common issues:
- Ollama not responding —
rc-service ollama status, restart command, curl test - Pull stuck or failing — disk space check, expected times for large models
- “Offline” on Cloudflare Pages — expected behavior, Ollama is local-only
- Model loaded but slow — VRAM overflow to system RAM, restart to unload
- Wrong endpoint —
OLLAMA_API_URLenv var configuration
Auto-opens when Ollama is detected as offline.
Offline Handling
When Ollama is unreachable, the page displays:
- Amber “offline” status badge (not red “error”)
- Centered offline hero with pulsing icon
- The configured endpoint URL that failed
- Restart command hint:
sudo rc-service ollama restart - Troubleshooting section auto-opens
- Auto-retry every 15 seconds with countdown timer in header
- Automatically transitions to online state when Ollama becomes available
When online, a 60-second health check runs in the background to detect if Ollama goes down.
API Route
GET /api/admin/ollama-manage
Always returns HTTP 200. Fetches /api/tags, /api/version, and /api/ps in parallel using Promise.allSettled().
Response shape:
{
"status": "connected",
"url": "http://localhost:11434",
"modelCount": 7,
"totalSize": 41830817642,
"version": "0.16.1",
"models": [...],
"runningModels": [...],
"vramTotal": 6243710752
}
When offline:
{
"status": "offline",
"url": "http://localhost:11434",
"error": "Ollama is not reachable"
}
POST /api/admin/ollama-manage
Actions via { action: "..." } body:
| Action | Description | Ollama Endpoint |
|---|---|---|
list-models | List installed models | GET /api/tags |
show-model | Full model details | POST /api/show |
pull-model | Download with SSE streaming | POST /api/pull |
delete-model | Remove a model | DELETE /api/delete |
running-models | Models loaded in VRAM | GET /api/ps |
health | Connectivity check | GET /api/tags |
Configuration
| Variable | Default | Description |
|---|---|---|
OLLAMA_API_URL | http://localhost:11434 | Ollama REST API base URL |
Model storage: /mnt/AllShare/ollama-models/ (configured in /etc/conf.d/ollama).
Ollama runs as an OpenRC service: /etc/init.d/ollama.
Files
| File | Purpose |
|---|---|
src/pages/admin/ollama.astro | Admin page (HTML, CSS, JS) |
src/pages/api/admin/ollama-manage.ts | API route (GET + POST) |
src/config/modules/ollama.ts | Module manifest v2.0.0 |