Colorado Legal RAG: Free Legal Research for Everyone

Here’s something that should bother you: the law is public. Every statute, every court opinion, every regulation — it’s all public record. Taxpayers fund the courts, the legislature, the entire system. And yet, if you want to actually search that law in any useful way, you’re looking at $300-500/month for Westlaw or LexisNexis.

If you’re a law firm billing $400/hour, that’s a rounding error. If you’re a single parent trying to understand your custody rights, or a small landlord trying to navigate eviction law, or anyone representing themselves in court? You’re locked out of the tools that make the law usable.

I’ve been on both sides of that. I’ve navigated family court proceedings where the other side had attorneys and I had Google. I’ve spent nights reading statutes raw, trying to figure out if a particular clause applied to my situation, finding one answer on one site and the opposite on another.

It shouldn’t be this hard. The data is public. The technology exists. Someone just needs to build it.

So I’m building it.


The Series at a Glance

PartWhat It CoversKey Theme
Part 1: The Vision (this page)Why this matters, who it’s for, what we’re buildingAccess to justice shouldn’t have a paywall
Part 2: The ArchitectureRAG pipeline, semantic search, LLM routing, citation verificationHow semantic search + LLMs make law searchable
Part 3: The Caselaw MarathonDownloading and indexing 67,300 court opinions from 6 Colorado courtsThe grind of turning raw data into searchable knowledge
Part 4: Titan InfrastructureRunning it all on a 12GB LXC container with memory watchdogsWhen you can’t throw money at infrastructure
Part 5: What’s NextQuery API, frontend, expansion beyond ColoradoFrom prototype to tool people actually use

What Is This, Exactly?

Colorado Legal RAG is a free, self-hosted legal research tool. You ask it a question in plain English — “Can my landlord keep my security deposit if I gave 30 days notice?” — and it:

  1. Searches Colorado statutes, case law, court rules, and regulations using semantic search (not keyword matching)
  2. Finds the relevant legal provisions with actual citations
  3. Generates a plain-English answer using an LLM, grounded in the sources it found
  4. Verifies every citation against the source material before showing it to you

That last part is critical. LLMs hallucinate. They make up case names, invent statute numbers, and confidently cite things that don’t exist. Every citation in a Colorado Legal RAG response gets verified against the actual indexed documents. If it can’t be verified, it doesn’t get shown.

Who Is This For?

Let me be clear about what this is and isn’t.

This IS for:

  • Self-represented litigants (pro se parties) trying to understand their rights
  • Legal aid organizations serving low-income clients
  • Law students researching Colorado law
  • Small firms that can’t justify Westlaw’s pricing
  • Anyone who wants to understand the law that governs their life

This is NOT:

  • A replacement for an attorney
  • Legal advice
  • A competitor to Westlaw or LexisNexis (they have 40 years of editorial enhancements I can’t replicate)
  • Perfect (it will get things wrong — all AI tools do)

The goal isn’t to replace lawyers. It’s to give people a starting point. To let someone walk into a courtroom with some understanding of the relevant law instead of none. To close the gap between “I can’t afford a lawyer” and “I have no idea what my rights are.”

The Data: What We’re Indexing

Colorado law comes from multiple sources, and we’re indexing all of them:

Colorado Revised Statutes (CRS) — The statutory code. Everything from criminal law to landlord-tenant rules to child support guidelines. We’ve already indexed 82,130 chunks from the full CRS. Done.

Case Law — Court opinions from 6 Colorado courts, sourced from CourtListener’s free API. 67,300 opinions total. This is the precedent — how courts have actually interpreted and applied the statutes. Currently building this index.

Federal Statutes — Federal law that applies in Colorado (bankruptcy, civil rights, immigration). Not started yet.

Court Rules & Forms — Procedural rules and official forms. The stuff that tells you how to file, not just what to file. Not started yet.

Why RAG Instead of Just Asking ChatGPT?

You can ask ChatGPT a legal question right now. It’ll give you an answer. That answer might be completely wrong, cite cases that don’t exist, and mix up jurisdictions — but it’ll sound confident doing it.

RAG (Retrieval-Augmented Generation) solves this by grounding the LLM’s response in actual source documents. Instead of asking the model to remember law from its training data (which is incomplete, outdated, and jurisdiction-agnostic), we:

  1. Take the user’s question
  2. Search our indexed legal documents for relevant passages
  3. Feed those passages to the LLM as context
  4. Ask it to answer based only on what we gave it
  5. Verify every citation it produces

The LLM becomes a reading comprehension engine, not a memory recall engine. It’s reading the actual statute and explaining it in plain English — not trying to remember what it learned about Colorado law during training.

The Technical Vision

The architecture is straightforward:

User Question


Embedding Model (all-MiniLM-L6-v2)


Semantic Search (ChromaDB)


RAG Context Builder (relevant chunks + metadata)


LLM Router (Simple → local | Complex → Claude/GPT)


Citation Verification (eyecite + regex)


Verified Answer with Sources

We’re using ChromaDB for vector storage, lightweight embedding models that can run on modest hardware, and complexity-based routing to choose the right LLM for each query. Simple questions (“What’s the statute of limitations for breach of contract in Colorado?”) get answered by a local model. Complex questions (“How have Colorado courts interpreted the economic loss doctrine in construction defect cases?”) get routed to Claude or GPT-4.

More on the architecture in Part 2.

Why I’m Doing This

I’m not a lawyer. I’m a systems engineer who spent too many nights reading raw statutes because I couldn’t afford the tools that make them searchable.

The experience of navigating the legal system without resources is isolating. You know the information exists somewhere, but you can’t find it efficiently, you can’t verify what you do find, and you’re always wondering if you’re missing something critical.

I can’t build a lawyer. But I can build a search engine that actually understands legal language, verifies its sources, and doesn’t charge $500/month to use.

That’s the project. Let’s build it.


Next up: Part 2 — The Architecture — How semantic search, ChromaDB, and LLM routing turn 82,000 statute chunks into answers you can actually cite.