Content Pipeline (Planned)
Automated conversation-to-content pipeline — scans AI conversations, auto-generates blog posts and journal entries with voice profile, queues for review
Content Pipeline
Status: Designed, not yet implemented. See handover: Vaults/handover/2026-03-15-content-studio-pipeline.md
The Content Pipeline automates the flow from AI conversations to published blog posts and journal entries. Daniel reviews and approves — he doesn’t write from scratch.
Pipeline Flow
SCAN → DISCOVER → CONTEXT → GENERATE → QUEUE → REVIEW → PUBLISH
1. SCAN — Find blog-worthy conversations
Source: 2,992 markdown conversations in Vaults/conversation-archive/conversations/
Filters:
blog_candidate: truein frontmatter (pre-flagged during extraction)- Technical category with 5+ messages and high word count
- Recent handovers and session notes
Categories most likely to produce content:
technical(889 conversations) — homelab, Linux, networking, containersknowledge-work(155) — PKM, blogging, AI/LLMcreative(209) — some technical writing
Blog candidate index: Vaults/conversation-archive/00-INDEX/BLOG-CANDIDATES.md
2. DISCOVER — Avoid duplication
Compare candidate topics against existing 104 posts + 97 journal entries. Group related conversations into topic clusters. Skip topics that already have published content.
3. CONTEXT — Auto-search RAG
For each topic, automatically search:
- Knowledge (103K chunks) — existing blog content, docs
- Vaults (294K chunks) — conversation archive, personal knowledge
- Private (304K chunks) — sensitive context, legal docs
Pull related existing posts to reference and link. Load verified facts from identity-ground-truth.json.
4. GENERATE — AI writes with voice
Use /api/admin/content-gen with:
- Input: Conversation text + RAG context
- InputMode:
transcriptfor conversations,topicfor clusters - Voice profile + facts + denied claims injected automatically
- Stream output with proper frontmatter (title, description, tags, mood, pubDate)
5. QUEUE — Store drafts for review
Store generated drafts in a review queue:
- Generated date, source conversations, voice score, status
- Storage option:
src/content/drafts/collection (fits Astro pattern)
6. REVIEW — Daniel’s only step
Browse queue in Content Studio’s Pipeline tab:
- One-click approve, edit, or reject
- Voice score shown alongside each draft
- Edit in-place before approving
7. PUBLISH — Write to disk
- Blog posts →
src/content/posts/ - Journal entries →
src/content/journal/ - Generate proper slug, frontmatter, hero image reference
- Mark source conversations as
blog_status: published
API Endpoints Needed
| Endpoint | Method | Purpose |
|---|---|---|
/api/admin/pipeline/scan | POST | Scan conversation archive for candidates |
/api/admin/pipeline/generate | POST | Generate draft from conversation(s) |
/api/admin/pipeline/queue | GET | List pending drafts |
/api/admin/pipeline/approve | POST | Approve draft → publish to disk |
Conversation Archive Format
Each of the 2,992 conversations is a markdown file with YAML frontmatter:
title: "Debugging Tailscale DNS Resolution"
platform: chatgpt
date: 2024-10-07
category: technical
primary_topic: networking
tags: [tailscale, dns, networking]
blog_candidate: true
blog_status: idea
message_count: 18
word_count: 6833
Processing pipeline: Vaults/conversation-archive/_scripts/extract_all.py handles extraction, parsing, sanitization, enrichment, and markdown generation.
What Already Exists
| Component | Status |
|---|---|
| Content-gen API (streaming) | Working |
| Voice profile builder | Working |
| Facts loader + denied claims | Working |
| RAG search + format | Working |
| Voice scoring | Working |
| Content Studio UI | Working |
| Conversation extraction pipeline | Working |
| RAG database (701K chunks) | Populated |
| Pipeline scan endpoint | NOT BUILT |
| Pipeline queue management | NOT BUILT |
| Pipeline UI tab | NOT BUILT |
| Auto-RAG search | NOT BUILT |
| Batch generation | NOT BUILT |
| Draft → publish flow | NOT BUILT |
Related
- Content Studio — The editor where pipeline output is reviewed
- MyVoice Studio — Voice profiles used during generation
- RAG System — Knowledge base powering context search