Skip to main content
Admin Modules

Ollama Model Manager

Local LLM model lifecycle management with graceful offline handling, VRAM monitoring, and built-in troubleshooting

February 27, 2026

Ollama Model Manager

The Ollama page at /admin/ollama manages local LLM models running on the workstation through the Ollama REST API. It provides model lifecycle management (pull, view, delete), VRAM monitoring, popular model suggestions, and graceful offline handling with auto-retry.

Architecture

Admin UI → /api/admin/ollama-manage → Ollama REST API (localhost:11434)
                                       ├── /api/tags (model list)
                                       ├── /api/version (version info)
                                       ├── /api/ps (running models / VRAM)
                                       ├── /api/show (model details)
                                       ├── /api/pull (download with streaming)
                                       └── /api/delete (remove model)

The API route always returns HTTP 200 with a status field (connected or offline). It never returns 502, so the page always renders gracefully regardless of whether Ollama is running.

Page Layout

Stats Row

Five stat cards:

  • Models — count of installed models
  • Disk Used — total size of all models
  • VRAM — current GPU VRAM usage by loaded models (or “idle”)
  • Version — Ollama server version
  • Endpoint — configured API URL

Pull Model

Input field for model names (e.g., llama3.2:3b, nomic-embed-text) with real-time SSE streaming progress bar. Supports Enter key to submit.

Running Models

Shows models currently loaded in GPU VRAM with:

  • Model name
  • VRAM memory usage
  • Unload timer (Ollama auto-unloads after ~5 min inactivity)

Empty when no model has been recently queried.

Installed Models

List of all downloaded models with:

  • Model name — click to view full details
  • Family badge (purple) — model family (qwen2, llama, nomic-bert, etc.)
  • Quantization badge (cyan) — quantization level (Q4_K_M, F16, etc.)
  • Size and parameters — disk size and parameter count
  • Modified date — relative timestamp
  • Copy button — copies model name to clipboard
  • Delete button — removes model with confirmation modal

Grid of 8 common models for one-click pulling:

ModelDescriptionSize
llama3.2:3bFast lightweight chat2.0 GB
llama3.1:8bBalanced chat model4.7 GB
qwen2.5:14bStrong multilingual9.0 GB
qwen3:14bLatest Qwen reasoning9.3 GB
deepseek-r1:14bReasoning-focused9.0 GB
qwen2.5-coder:14bCode generation9.0 GB
nomic-embed-textText embeddings for RAG274 MB
gemma2:9bGoogle open model5.4 GB

Already-installed models are grayed out. Clicking an available model auto-fills the pull input and starts downloading.

Instructions Section

Collapsible panel explaining all features: Pull Models, Running Models, Model Details, Copy & Delete. Notes on model storage location and service configuration.

Troubleshooting Section

Collapsible panel with 5 common issues:

  1. Ollama not respondingrc-service ollama status, restart command, curl test
  2. Pull stuck or failing — disk space check, expected times for large models
  3. “Offline” on Cloudflare Pages — expected behavior, Ollama is local-only
  4. Model loaded but slow — VRAM overflow to system RAM, restart to unload
  5. Wrong endpointOLLAMA_API_URL env var configuration

Auto-opens when Ollama is detected as offline.

Offline Handling

When Ollama is unreachable, the page displays:

  • Amber “offline” status badge (not red “error”)
  • Centered offline hero with pulsing icon
  • The configured endpoint URL that failed
  • Restart command hint: sudo rc-service ollama restart
  • Troubleshooting section auto-opens
  • Auto-retry every 15 seconds with countdown timer in header
  • Automatically transitions to online state when Ollama becomes available

When online, a 60-second health check runs in the background to detect if Ollama goes down.

API Route

GET /api/admin/ollama-manage

Always returns HTTP 200. Fetches /api/tags, /api/version, and /api/ps in parallel using Promise.allSettled().

Response shape:

{
  "status": "connected",
  "url": "http://localhost:11434",
  "modelCount": 7,
  "totalSize": 41830817642,
  "version": "0.16.1",
  "models": [...],
  "runningModels": [...],
  "vramTotal": 6243710752
}

When offline:

{
  "status": "offline",
  "url": "http://localhost:11434",
  "error": "Ollama is not reachable"
}

POST /api/admin/ollama-manage

Actions via { action: "..." } body:

ActionDescriptionOllama Endpoint
list-modelsList installed modelsGET /api/tags
show-modelFull model detailsPOST /api/show
pull-modelDownload with SSE streamingPOST /api/pull
delete-modelRemove a modelDELETE /api/delete
running-modelsModels loaded in VRAMGET /api/ps
healthConnectivity checkGET /api/tags

Configuration

VariableDefaultDescription
OLLAMA_API_URLhttp://localhost:11434Ollama REST API base URL

Model storage: /mnt/AllShare/ollama-models/ (configured in /etc/conf.d/ollama).

Ollama runs as an OpenRC service: /etc/init.d/ollama.

Files

FilePurpose
src/pages/admin/ollama.astroAdmin page (HTML, CSS, JS)
src/pages/api/admin/ollama-manage.tsAPI route (GET + POST)
src/config/modules/ollama.tsModule manifest v2.0.0
adminollamallmmodelsailocalvram