Skip to main content
AI & Automation

Content Pipeline (Planned)

Automated conversation-to-content pipeline — scans AI conversations, auto-generates blog posts and journal entries with voice profile, queues for review

March 15, 2026

Content Pipeline

Status: Designed, not yet implemented. See handover: Vaults/handover/2026-03-15-content-studio-pipeline.md

The Content Pipeline automates the flow from AI conversations to published blog posts and journal entries. Daniel reviews and approves — he doesn’t write from scratch.

Pipeline Flow

SCAN → DISCOVER → CONTEXT → GENERATE → QUEUE → REVIEW → PUBLISH

1. SCAN — Find blog-worthy conversations

Source: 2,992 markdown conversations in Vaults/conversation-archive/conversations/

Filters:

  • blog_candidate: true in frontmatter (pre-flagged during extraction)
  • Technical category with 5+ messages and high word count
  • Recent handovers and session notes

Categories most likely to produce content:

  • technical (889 conversations) — homelab, Linux, networking, containers
  • knowledge-work (155) — PKM, blogging, AI/LLM
  • creative (209) — some technical writing

Blog candidate index: Vaults/conversation-archive/00-INDEX/BLOG-CANDIDATES.md

2. DISCOVER — Avoid duplication

Compare candidate topics against existing 104 posts + 97 journal entries. Group related conversations into topic clusters. Skip topics that already have published content.

3. CONTEXT — Auto-search RAG

For each topic, automatically search:

  • Knowledge (103K chunks) — existing blog content, docs
  • Vaults (294K chunks) — conversation archive, personal knowledge
  • Private (304K chunks) — sensitive context, legal docs

Pull related existing posts to reference and link. Load verified facts from identity-ground-truth.json.

4. GENERATE — AI writes with voice

Use /api/admin/content-gen with:

  • Input: Conversation text + RAG context
  • InputMode: transcript for conversations, topic for clusters
  • Voice profile + facts + denied claims injected automatically
  • Stream output with proper frontmatter (title, description, tags, mood, pubDate)

5. QUEUE — Store drafts for review

Store generated drafts in a review queue:

  • Generated date, source conversations, voice score, status
  • Storage option: src/content/drafts/ collection (fits Astro pattern)

6. REVIEW — Daniel’s only step

Browse queue in Content Studio’s Pipeline tab:

  • One-click approve, edit, or reject
  • Voice score shown alongside each draft
  • Edit in-place before approving

7. PUBLISH — Write to disk

  • Blog posts → src/content/posts/
  • Journal entries → src/content/journal/
  • Generate proper slug, frontmatter, hero image reference
  • Mark source conversations as blog_status: published

API Endpoints Needed

EndpointMethodPurpose
/api/admin/pipeline/scanPOSTScan conversation archive for candidates
/api/admin/pipeline/generatePOSTGenerate draft from conversation(s)
/api/admin/pipeline/queueGETList pending drafts
/api/admin/pipeline/approvePOSTApprove draft → publish to disk

Conversation Archive Format

Each of the 2,992 conversations is a markdown file with YAML frontmatter:

title: "Debugging Tailscale DNS Resolution"
platform: chatgpt
date: 2024-10-07
category: technical
primary_topic: networking
tags: [tailscale, dns, networking]
blog_candidate: true
blog_status: idea
message_count: 18
word_count: 6833

Processing pipeline: Vaults/conversation-archive/_scripts/extract_all.py handles extraction, parsing, sanitization, enrichment, and markdown generation.

What Already Exists

ComponentStatus
Content-gen API (streaming)Working
Voice profile builderWorking
Facts loader + denied claimsWorking
RAG search + formatWorking
Voice scoringWorking
Content Studio UIWorking
Conversation extraction pipelineWorking
RAG database (701K chunks)Populated
Pipeline scan endpointNOT BUILT
Pipeline queue managementNOT BUILT
Pipeline UI tabNOT BUILT
Auto-RAG searchNOT BUILT
Batch generationNOT BUILT
Draft → publish flowNOT BUILT
pipelineautomationcontent-generationvoiceragconversations