RAG System Architecture
Four-tier Retrieval-Augmented Generation system powering Arcturus-Prime's AI search across all knowledge stores
RAG System Architecture
Arcturus-Prime uses a four-tier RAG system that provides AI-powered document search across all knowledge stores. Each tier serves a different audience and security level.
Tiers at a Glance
| Tier | Database | Docs | Chunks | Size | Embedding Model | Dims | Audience |
|---|---|---|---|---|---|---|---|
| Public | embeddings-index.json | — | 775 | 16.1 MB | OpenRouter text-embedding-3-small | 1536 | Anyone (CF Pages CDN) |
| Knowledge | rag-store-blog.db | 3,778 | 33,101 | 290 MB | Ollama qwen3-embedding:0.6b (local GPU) | 1024 | Safe for external AI |
| Vaults | rag-store-vaults.db | 8,297 | 132,151 | ~1.1 GB | Ollama nomic-embed-text (local GPU) | 768 | Local only |
| Private | rag-store.db | 10,440 | 166,183 | 1.5 GB | Ollama nomic-embed-text (local GPU) | 768 | Local only |
Tier Details
Public Tier
Deployed as a static JSON file to Cloudflare Pages CDN. Contains sanitized blog posts, journal entries, documentation, project pages, and learning content. All content passes through the Galactic Identity System before embedding.
Uses OpenRouter’s text-embedding-3-small (1536 dimensions). Rebuilt automatically during npm run build only when content hash changes — costs ~$0.01 per rebuild.
Collections: posts (263 chunks), docs (347), journal (136), projects (11), learn (5)
Knowledge / Safe Tier
SQLite database with all vault knowledge, sanitized via identity_map.json. Safe to use with external AI providers — no real hostnames, IPs, or personal info exposed.
Uses Ollama qwen3-embedding:0.6b on the local RTX 4070 Ti (1024 dimensions, free). Upgraded from nomic-embed-text (768-dim) for better retrieval quality — MTEB retrieval score 61.82 vs 49.01.
14 collections covering: technical vault, sanitized knowledge base, Argo OS docs, dev vault, blog content, AI context, build swarm docs, career docs, and more.
Vaults Tier
Everything in the Knowledge tier plus unsanitized personal vaults and old knowledge base. Excludes legal paperwork. Uses the improved paragraph-aware chunker and text normalization.
19 collections — adds: personal vault (2,243 docs), old knowledge base (2,253 docs), test conversations, Arcturus-Prime configs, and learning content.
Private / Full Tier
Everything. All vaults plus all legal paperwork (PDF, DOCX, RTF, EML, MSG, XLSX). No sanitization. Only accessible locally via Ollama.
19 collections — adds legal-paperwork (2,144 docs, 61,585 chunks) from /mnt/workspace/Documents/Important Documents/Legal/Legal Paperwork/, excluding Duplicate/ and Criminal Case/ subdirectories.
Ingestion Pipeline
Source Files
↓
parseFile() — Multi-format parser (MD, PDF, DOCX, RTF, EML, MSG, XLSX)
↓
normalizeForRAG() — Fix PDF artifacts, hyphenation, page numbers, quotes
↓
cleanText() — Strip HTML, markdown images, code blocks
↓
chunkText() — Paragraph-aware splitting (400 words, 3000 char cap)
↓
SHA-256 dedup — Skip documents with unchanged content hash
↓
SQLite storage — Documents + chunks stored
↓
Context prefix — "[Title | collection]" prepended for embedding only
↓
Ollama embedding — qwen3-embedding:0.6b on RTX 4070 Ti GPU
↓
Vector storage — 1024-dim embeddings saved to SQLite
Multi-Format Parser
| Format | Parser Method |
|---|---|
.md, .txt, .csv | Direct file read |
.pdf | pdftotext -layout (Poppler CLI) |
.docx | python-docx library |
.rtf | striprtf library |
.eml | Python email stdlib |
.msg | extract-msg library |
.xlsx | openpyxl library |
Text Normalization
Applied to all parsed content before ingestion:
- Fixes PDF line-break hyphenation (
disagree-\nment→disagreement) - Removes standalone page numbers and
Page X of Ypatterns - Removes form separator lines
- Normalizes smart quotes and em-dashes
- Strips control characters
- Collapses excessive whitespace
Paragraph-Aware Chunking
The chunker groups whole paragraphs together up to the 400-word limit rather than splitting at arbitrary word boundaries. This keeps semantic units intact for better search relevance.
- Word limit: 400 words per chunk
- Overlap: 80 words between consecutive chunks
- Character cap: 3,000 chars max (prevents monster chunks from CSV/URL data)
- Minimum: 100 chars (skips tiny useless chunks)
Context-Prefixed Embeddings
Each chunk’s embedding is computed with a context prefix:
[Motion to Dismiss | legal-paperwork] The court hereby orders that...
The prefix is only used for the embedding computation — stored text stays clean for FTS5 full-text search. This gives the embedding model context about what document each chunk belongs to.
Search
Three search methods available:
- Vector similarity — cosine similarity against stored embeddings (1024-dim for qwen3, 768-dim for nomic)
- FTS5 full-text search — SQLite FTS5 index on raw chunk text
- Hybrid — combines both scores with configurable weights
Build Commands
cd ~/Development/Arcturus-Prime
# Build specific tier
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier knowledge
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier vaults
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private
# Scan only (no ingestion)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --dry-run
# Resume embedding (skip file scanning)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --embed-only
# Public tier (runs automatically during npm run build)
node scripts/build-embeddings.js
Storage & Deployment
Local only (gitignored): All SQLite databases (packages/argonaut/data/*.db)
Deployed to Cloudflare Pages:
public/embeddings-index.json(15.8 MB) — public tier embeddingspublic/embeddings-meta.json(367 bytes) — metadatapublic/rag-stats.json(7 KB) — admin panel stats snapshot
The auto-embeddings.js prebuild script uses content-hash caching to skip rebuilds when content hasn’t changed, keeping Cloudflare build times fast.
Vault Sources
Configured in packages/argonaut/src/rag/vault-config.ts. Supports custom additions via data/vault-config.json.
Knowledge tier (16 sources): All ~/Vaults/* directories plus Arcturus-Prime content directories (src/content/docs, posts, journal, projects, configurations, learn).
Private extras (6 sources): ~/Vaults/main (personal), ~/Vaults/conversation-archive, ~/Vaults/test (conversations), ~/Vaults/RAG (RAG meta-docs), ~/Vaults/Instructions, and Legal Paperwork.
Vaults tier = Knowledge + Private extras minus legal-paperwork.
Key Files
| File | Purpose |
|---|---|
packages/argonaut/src/rag/parsers.ts | Multi-format file parser + normalizeForRAG() |
packages/argonaut/src/rag/chunker.ts | Paragraph-aware text chunker |
packages/argonaut/src/rag/embedder.ts | Ollama/OpenRouter embedding client |
packages/argonaut/src/rag/sqlite-store.ts | DurableRAGStore — append-only SQLite backend |
packages/argonaut/src/rag/vault-config.ts | Vault source configuration |
packages/argonaut/src/rag/sanitizer.ts | Identity map sanitization |
packages/argonaut/src/rag/hybrid.ts | Hybrid search engine |
packages/argonaut/scripts/build-blog-rag.ts | CLI build script |
packages/argonaut/scripts/re-embed-db.ts | Re-embed DB with different model |
packages/argonaut/scripts/test-rag-search.ts | Benchmark and comparison tool |
scripts/auto-embeddings.js | Smart public embeddings rebuild |
src/pages/api/admin/rag.ts | Admin API for vault management |
Embedding Models
Active Models
| Model | Dimensions | Size | Speed | MTEB Retrieval | Used By |
|---|---|---|---|---|---|
qwen3-embedding:0.6b | 1024 | 639 MB | ~8 chunks/s | 61.82 | Knowledge tier |
nomic-embed-text | 768 | 274 MB | ~25 chunks/s | 49.01 | Vaults, Private tiers |
text-embedding-3-small | 1536 | Cloud | Instant | — | Public tier (OpenRouter) |
Nomic vs Qwen3 Comparison
Benchmark results from 10-query test across infrastructure, networking, AI, and deployment topics:
| Store | Model | Avg Top-1 Score | Avg Search Time |
|---|---|---|---|
| Knowledge | qwen3-0.6b | 0.809 | 3,031 ms |
| Private | nomic | 0.814 | 48,142 ms |
| Vaults | nomic | 0.770 | 27,177 ms |
qwen3 achieves comparable relevance scores at ~15x faster search time due to smaller dimension count and DB size.
Re-Embedding
Use the re-embed-db.ts script to create model-variant copies of any tier:
# Create a nomic copy of the Knowledge tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
--source rag-store-blog.db \
--output rag-store-blog-nomic.db \
--model nomic-embed-text
# Create a qwen3 copy of any tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
--source rag-store.db \
--output rag-store-qwen3.db \
--model qwen3-embedding:0.6b
GPU & Cost
- Hardware: NVIDIA RTX 4070 Ti
- Primary model: qwen3-embedding:0.6b (1024 dimensions)
- Speed: ~8 chunks/second (qwen3), ~25 chunks/second (nomic)
- Knowledge tier rebuild: ~70 minutes for 33K chunks (qwen3)
- Full private rebuild: ~110 minutes for 166K chunks (nomic)
- Cost: $0 for local tiers, ~$0.01 for public tier rebuilds
Backups
All RAG databases are archived on /mnt/AllShare/rag/databases/ with model-tagged filenames. Nomic-era backups are preserved for A/B comparison testing. Never delete backups unless explicitly instructed.