Skip to main content
AI & Automation

RAG Data Landscape

Complete map of all vault sources, embedding databases, model variants, and backup archives powering the four-tier RAG system

February 28, 2026

RAG Data Landscape

Complete map of every data source, embedding database, and backup in the RAG system. Updated as new vaults are added or models change.


Vault Sources

All source data originates from Obsidian vaults and Arcturus-Prime content directories. These are the raw inputs to the ingestion pipeline.

Obsidian Vaults (~/Vaults/)

VaultCollection NameFilesSizeTier(s)
Arcturus-Prime-technicalArcturus-Prime-technical1,80180 MBKnowledge, Vaults, Private
knowledge-vault-sanitizedknowledge-sanitized1,02035 MBKnowledge, Vaults, Private
argo-os-docsargo-os-docs6249.3 MBKnowledge, Vaults, Private
dev-vaultdev-vault8601.2 GBKnowledge, Vaults, Private
ai-contextai-context70332 KBKnowledge, Vaults, Private
build-swarmbuild-swarm23152 KBKnowledge, Vaults, Private
careercareer19144 KBKnowledge, Vaults, Private
tendriltendril18284 KBKnowledge, Vaults, Private
jobspyjobspy1168 KBKnowledge, Vaults, Private
laforceit-vaultlaforceit840 KBKnowledge, Vaults, Private
mainpersonal4,4425.8 GBVaults, Private
conversation-archiveconversation-archive3,044147 MBVaults, Private
testtest-conversations280 MBVaults, Private

Arcturus-Prime Content (src/content/)

DirectoryCollection NameFilesSizeTier(s)
src/content/docsArcturus-Prime-docs1081.3 MBKnowledge, Vaults, Private
src/content/postsArcturus-Prime-posts76972 KBKnowledge, Vaults, Private
src/content/journalArcturus-Prime-journal73572 KBKnowledge, Vaults, Private
src/content/projectsArcturus-Prime-projects856 KBKnowledge, Vaults, Private
src/content/configurationsArcturus-Prime-configs112 KBKnowledge, Vaults, Private
src/content/learnArcturus-Prime-learn324 KBKnowledge, Vaults, Private

External Sources

SourceCollection NameFilesSizeTier(s)
Legal Paperworklegal-paperwork3,62417 GBPrivate only

Vaults NOT in Any Tier

These exist on disk but are intentionally excluded from RAG ingestion:

VaultLocationReason
~/Vaults/InstructionsN/APrompt templates, not knowledge
~/Vaults/RAGN/AMeta/config for RAG itself
~/Vaults/Arcturus-PrimeN/AContains credentials

Embedding Databases

All SQLite databases live in packages/argonaut/data/ (gitignored).

Active Databases

DatabaseTierModelDimsDocsChunksSize
rag-store-blog.dbKnowledgeqwen3-embedding:0.6b10243,77833,101290 MB
rag-store-vaults.dbVaultsnomic-embed-text7688,297132,1511.1 GB
rag-store.dbPrivatenomic-embed-text76810,440166,1831.5 GB

Backup / Comparison Databases

DatabaseTierModelDimsPurpose
rag-store-blog-nomic.dbKnowledgenomic-embed-text768A/B comparison baseline
rag-store-vaults-nomic.dbVaultsnomic-embed-text768Pre-upgrade backup
rag-store-private-nomic.dbPrivatenomic-embed-text768Pre-upgrade backup

Public Tier (Deployed)

FileChunksSizeModelDims
public/embeddings-index.json77516.1 MBOpenRouter text-embedding-3-small1536

Embedding Models

Installed on Local GPU (RTX 4070 Ti)

ModelTagSizeDimensionsContextMTEB RetrievalSpeed
qwen3-embedding:0.6b639 MB102432K61.82~8 chunks/s
qwen3-embedding:latest (8b)4.7 GB409632K66.27~2 chunks/s
nomic-embed-text:latest274 MB7688K49.01~25 chunks/s

Important: The :latest tag for qwen3-embedding maps to the 8b model (4096-dim). Always use :0.6b explicitly to get the 1024-dim model.

Benchmark Results (10-query test)

StoreModelAvg Top-1 ScoreAvg Search Time
Knowledge (33K chunks)qwen3-0.6b0.8093,031 ms
Private (166K chunks)nomic0.81448,142 ms
Vaults (132K chunks)nomic0.77027,177 ms

qwen3 delivers comparable relevance at much faster search times due to smaller DB and dimensions.


Archive on AllShare

All databases and source mirrors are archived on /mnt/AllShare/rag/ (2.0 TB NTFS3 partition, ~1.5 TB free).

/mnt/AllShare/rag/
├── manifest.json              # Full inventory of all collections and databases
├── databases/                 # All .db files (active + backups)
│   ├── rag-store-blog.db      # Knowledge tier (qwen3)
│   ├── rag-store-blog-nomic.db # Knowledge tier (nomic backup)
│   ├── rag-store-vaults.db    # Vaults tier
│   ├── rag-store-vaults-nomic.db # Vaults nomic backup
│   ├── rag-store.db           # Private tier
│   └── rag-store-private-nomic.db # Private nomic backup
└── sources/                   # Mirrored vault sources
    ├── dev-vault/             # 1.2 GB
    ├── Arcturus-Prime-technical/     # 80 MB
    ├── personal/              # 5.8 GB
    ├── legal-paperwork/       # 17 GB
    └── ... (20 collections total)

Policy: Never delete backups from AllShare unless explicitly instructed.


Tier Composition

How tiers build on each other

Public (775 chunks)
  └── Blog posts, journal, docs, projects, learn
      └── Embedded with OpenRouter text-embedding-3-small (cloud)
      └── Deployed as static JSON to CF Pages CDN

Knowledge / Safe (33,101 chunks)
  └── All 10 Obsidian knowledge vaults
  └── All 6 Arcturus-Prime content directories
      └── Sanitized via identity_map.json (148 patterns)
      └── Embedded with qwen3-embedding:0.6b (local GPU)
      └── Safe for external AI providers

Vaults (132,151 chunks)
  └── Everything in Knowledge
  └── + personal vault (5.8 GB)
  └── + old knowledge base (147 MB)
  └── + test conversations (80 MB)
  └── + Arcturus-Prime configs + learn
      └── NOT sanitized — raw content
      └── Embedded with nomic-embed-text (local GPU)
      └── Local access only

Private / Full (166,183 chunks)
  └── Everything in Vaults
  └── + legal-paperwork (17 GB, 3,624 files)
      └── NOT sanitized — passwords, keys preserved
      └── Embedded with nomic-embed-text (local GPU)
      └── Local access only

Build & Re-embed Commands

cd ~/Development/Arcturus-Prime

# Build specific tier (ingest + embed)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier knowledge
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier vaults
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier private

# Embed-only (skip file scanning)
npx tsx packages/argonaut/scripts/build-blog-rag.ts --tier knowledge --embed-only

# Re-embed with different model (creates a copy)
npx tsx packages/argonaut/scripts/re-embed-db.ts \
  --source rag-store-blog.db \
  --output rag-store-blog-nomic.db \
  --model nomic-embed-text

# Benchmark/compare databases
npx tsx packages/argonaut/scripts/test-rag-search.ts --benchmark
npx tsx packages/argonaut/scripts/test-rag-search.ts --compare --query "tailscale vpn"
npx tsx packages/argonaut/scripts/test-rag-search.ts --list  # show all discovered DBs

Configuration

Vault sources are defined in packages/argonaut/src/rag/vault-config.ts. Custom vaults can be added via data/vault-config.json:

{
  "knowledge": [
    { "collection": "my-vault", "path": "/path/to/vault", "sourceType": "vault" }
  ],
  "private": []
}

The build script auto-discovers custom vaults and includes them in the appropriate tier.

ragembeddingsvaultsdatabasesbackupsollama