Skip to main content
General

RAG System Architecture

Four-tier Retrieval-Augmented Generation system powering Arcturus-Prime's AI search across all knowledge stores

February 27, 2026

RAG System Architecture

Arcturus-Prime uses a four-tier RAG system that provides AI-powered document search across all knowledge stores. Each tier serves a different audience and security level.

Tiers at a Glance

TierDatabaseDocsChunksSizeEmbedding ModelDimsAudience
Publicembeddings-index.json77516.1 MBOpenRouter text-embedding-3-small1536Anyone (CF Pages CDN)
Knowledgerag-store-blog.db3,77833,101290 MBOllama qwen3-embedding:0.6b (local GPU)1024Safe for external AI
Vaultsrag-store-vaults.db8,297132,151~1.1 GBOllama nomic-embed-text (local GPU)768Local only
Privaterag-store.db10,440166,1831.5 GBOllama nomic-embed-text (local GPU)768Local only

Tier Details

Public Tier

Deployed as a static JSON file to Cloudflare Pages CDN. Contains sanitized blog posts, journal entries, documentation, project pages, and learning content. All content passes through the Galactic Identity System before embedding.

Uses OpenRouter’s text-embedding-3-small (1536 dimensions). Rebuilt automatically during npm run build only when content hash changes — costs ~$0.01 per rebuild.

Collections: posts (263 chunks), docs (347), journal (136), projects (11), learn (5)

Knowledge / Safe Tier

SQLite database with all vault knowledge, sanitized via identity_map.json. Safe to use with external AI providers — no real hostnames, IPs, or personal info exposed.

Uses Ollama qwen3-embedding:0.6b on the local RTX 4070 Ti (1024 dimensions, free). Upgraded from nomic-embed-text (768-dim) for better retrieval quality — MTEB retrieval score 61.82 vs 49.01.

14 collections covering: technical vault, sanitized knowledge base, Argo OS docs, dev vault, blog content, AI context, build swarm docs, career docs, and more.

Vaults Tier

Everything in the Knowledge tier plus unsanitized personal vaults and old knowledge base. Excludes legal paperwork. Uses the improved paragraph-aware chunker and text normalization.

19 collections — adds: personal vault (2,243 docs), old knowledge base (2,253 docs), test conversations, Arcturus-Prime configs, and learning content.

Private / Full Tier

Everything. All vaults plus all legal paperwork (PDF, DOCX, RTF, EML, MSG, XLSX). No sanitization. Only accessible locally via Ollama.

19 collections — adds legal-paperwork (2,144 docs, 61,585 chunks) from /mnt/workspace/Documents/Important Documents/Legal/Legal Paperwork/, excluding Duplicate/ and Criminal Case/ subdirectories.

Ingestion Pipeline

Source Files

parseFile()          — Multi-format parser (MD, PDF, DOCX, RTF, EML, MSG, XLSX)

normalizeForRAG()    — Fix PDF artifacts, hyphenation, page numbers, quotes

cleanText()          — Strip HTML, markdown images, code blocks

chunkText()          — Paragraph-aware splitting (400 words, 3000 char cap)

SHA-256 dedup        — Skip documents with unchanged content hash

SQLite storage       — Documents + chunks stored

Context prefix       — "[Title | collection]" prepended for embedding only

Ollama embedding     — qwen3-embedding:0.6b on RTX 4070 Ti GPU

Vector storage       — 1024-dim embeddings saved to SQLite

Multi-Format Parser

FormatParser Method
.md, .txt, .csvDirect file read
.pdfpdftotext -layout (Poppler CLI)
.docxpython-docx library
.rtfstriprtf library
.emlPython email stdlib
.msgextract-msg library
.xlsxopenpyxl library

Text Normalization

Applied to all parsed content before ingestion:

  • Fixes PDF line-break hyphenation (disagree-\nmentdisagreement)
  • Removes standalone page numbers and Page X of Y patterns
  • Removes form separator lines
  • Normalizes smart quotes and em-dashes
  • Strips control characters
  • Collapses excessive whitespace

Paragraph-Aware Chunking

The chunker groups whole paragraphs together up to the 400-word limit rather than splitting at arbitrary word boundaries. This keeps semantic units intact for better search relevance.

  • Word limit: 400 words per chunk
  • Overlap: 80 words between consecutive chunks
  • Character cap: 3,000 chars max (prevents monster chunks from CSV/URL data)
  • Minimum: 100 chars (skips tiny useless chunks)

Context-Prefixed Embeddings

Each chunk’s embedding is computed with a context prefix:

[Motion to Dismiss | legal-paperwork] The court hereby orders that...

The prefix is only used for the embedding computation — stored text stays clean for FTS5 full-text search. This gives the embedding model context about what document each chunk belongs to.

Three search methods available:

  1. Vector similarity — cosine similarity against stored embeddings (1024-dim for qwen3, 768-dim for nomic)
  2. FTS5 full-text search — SQLite FTS5 index on raw chunk text
  3. Hybrid — combines both scores with configurable weights

Build Commands

cd ~/Development/Arcturus-Prime

# Build specific tier
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier knowledge
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier vaults
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private

# Scan only (no ingestion)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --dry-run

# Resume embedding (skip file scanning)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private --embed-only

# Public tier (runs automatically during npm run build)
node scripts/build-embeddings.js

Storage & Deployment

Local only (gitignored): All SQLite databases (packages/argonaut/data/*.db)

Deployed to Cloudflare Pages:

  • public/embeddings-index.json (15.8 MB) — public tier embeddings
  • public/embeddings-meta.json (367 bytes) — metadata
  • public/rag-stats.json (7 KB) — admin panel stats snapshot

The auto-embeddings.js prebuild script uses content-hash caching to skip rebuilds when content hasn’t changed, keeping Cloudflare build times fast.

Vault Sources

Configured in packages/argonaut/src/rag/vault-config.ts. Supports custom additions via data/vault-config.json.

Knowledge tier (16 sources): All ~/Vaults/* directories plus Arcturus-Prime content directories (src/content/docs, posts, journal, projects, configurations, learn).

Private extras (6 sources): ~/Vaults/main (personal), ~/Vaults/conversation-archive, ~/Vaults/test (conversations), ~/Vaults/RAG (RAG meta-docs), ~/Vaults/Instructions, and Legal Paperwork.

Vaults tier = Knowledge + Private extras minus legal-paperwork.

Key Files

FilePurpose
packages/argonaut/src/rag/parsers.tsMulti-format file parser + normalizeForRAG()
packages/argonaut/src/rag/chunker.tsParagraph-aware text chunker
packages/argonaut/src/rag/embedder.tsOllama/OpenRouter embedding client
packages/argonaut/src/rag/sqlite-store.tsDurableRAGStore — append-only SQLite backend
packages/argonaut/src/rag/vault-config.tsVault source configuration
packages/argonaut/src/rag/sanitizer.tsIdentity map sanitization
packages/argonaut/src/rag/hybrid.tsHybrid search engine
packages/argonaut/scripts/build-blog-rag.tsCLI build script
packages/argonaut/scripts/re-embed-db.tsRe-embed DB with different model
packages/argonaut/scripts/test-rag-search.tsBenchmark and comparison tool
scripts/auto-embeddings.jsSmart public embeddings rebuild
src/pages/api/admin/rag.tsAdmin API for vault management

Embedding Models

Active Models

ModelDimensionsSizeSpeedMTEB RetrievalUsed By
qwen3-embedding:0.6b1024639 MB~8 chunks/s61.82Knowledge tier
nomic-embed-text768274 MB~25 chunks/s49.01Vaults, Private tiers
text-embedding-3-small1536CloudInstantPublic tier (OpenRouter)

Nomic vs Qwen3 Comparison

Benchmark results from 10-query test across infrastructure, networking, AI, and deployment topics:

StoreModelAvg Top-1 ScoreAvg Search Time
Knowledgeqwen3-0.6b0.8093,031 ms
Privatenomic0.81448,142 ms
Vaultsnomic0.77027,177 ms

qwen3 achieves comparable relevance scores at ~15x faster search time due to smaller dimension count and DB size.

Re-Embedding

Use the re-embed-db.ts script to create model-variant copies of any tier:

# Create a nomic copy of the Knowledge tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
  --source rag-store-blog.db \
  --output rag-store-blog-nomic.db \
  --model nomic-embed-text

# Create a qwen3 copy of any tier
npx tsx packages/argonaut/scripts/re-embed-db.ts \
  --source rag-store.db \
  --output rag-store-qwen3.db \
  --model qwen3-embedding:0.6b

GPU & Cost

  • Hardware: NVIDIA RTX 4070 Ti
  • Primary model: qwen3-embedding:0.6b (1024 dimensions)
  • Speed: ~8 chunks/second (qwen3), ~25 chunks/second (nomic)
  • Knowledge tier rebuild: ~70 minutes for 33K chunks (qwen3)
  • Full private rebuild: ~110 minutes for 166K chunks (nomic)
  • Cost: $0 for local tiers, ~$0.01 for public tier rebuilds

Backups

All RAG databases are archived on /mnt/AllShare/rag/databases/ with model-tagged filenames. Nomic-era backups are preserved for A/B comparison testing. Never delete backups unless explicitly instructed.

ragaiembeddingsollamasearch