Content Generation Pipeline

Arcturus-Prime uses a structured pipeline to produce written content that matches a consistent voice and meets quality standards before publishing. The pipeline spans from raw transcripts through AI-assisted drafting, voice authenticity scoring, fact verification, editorial review, and final publication. Every generation step is informed by the canonical voice profile stored at data/voice-profile.md.

Voice Engine

The voice engine is the foundation of all content generation in Arcturus-Prime. It ensures that AI-generated content sounds like it was written by the same person who writes the manually authored posts. The engine loads the voice profile at startup and injects it as system context into every generation request.

How the Voice Engine Works

When a content generation request comes in, the voice engine:

Loads the voice profile from data/voice-profile.md
Constructs a system prompt that combines the voice profile with content-type-specific instructions
Appends any additional context (topic research, RAG-retrieved content, outline)
Sends the assembled prompt to the selected AI model
Streams the response back to the caller
Optionally runs a voice authenticity check on the output

The voice engine is not a separate model or fine-tune. It is a prompt engineering system that constrains general-purpose models to produce output matching a specific writing style.

Voice Profile (data/voice-profile.md)

The voice profile is the canonical style guide that defines the Arcturus-Prime writing voice. It is a Markdown document containing:

Tone and Personality

Technically precise but accessible to intermediate developers
Conversational without being casual or unprofessional
First-person singular when sharing personal experience, third-person for technical explanations
Direct and opinionated, willing to state preferences with reasoning
Enthusiastic about homelab and self-hosting without being preachy

Vocabulary Preferences

Prefer concrete technical terms over vague descriptions
Use “homelab” not “home laboratory” or “home server setup”
Use active voice by default, passive voice only for emphasis
Avoid corporate jargon: not “leverage” but “use”, not “utilize” but “use”, not “paradigm” but “approach”
Spell out acronyms on first use, then use the acronym

Sentence Structure

Mix short declarative sentences with longer explanatory ones
Lead paragraphs with the main point, follow with supporting detail
Use lists and tables for structured information rather than dense paragraphs
Code examples should be complete and runnable, not fragments
Headings should be descriptive nouns or gerunds, not questions

Topic Expertise

Deep: Proxmox, Unraid, Docker, Cloudflare, Astro, Node.js, TypeScript, Python, AI/ML pipelines
Moderate: Rust, Go, Kubernetes, Terraform, networking, security
Surface: mobile development, game engines, blockchain

Content Patterns

Blog posts open with the problem being solved or the goal, not with background
Journal entries are more reflective and include decision-making rationale
Documentation is structured with clear prerequisites, steps, and verification
Project write-ups follow a build-log format: motivation, approach, implementation, results

Content Generation Endpoint

/api/admin/content-gen (POST)

The primary content generation endpoint. Request body:

{
  "type": "post",
  "topic": "Setting up WireGuard VPN on Proxmox LXC containers",
  "outline": [
    "Why WireGuard over OpenVPN",
    "Creating the LXC container on Tarn-Host",
    "Installing and configuring WireGuard",
    "Client configuration for mobile and laptop",
    "Testing and troubleshooting"
  ],
  "length": "long",
  "model": "claude-sonnet-4-20250514",
  "includeRag": true
}

Parameters:

Field	Type	Description
`type`	string	Content type: `post`, `journal`, `doc`, `project`
`topic`	string	The subject to write about
`outline`	string[]	Optional section outline to follow
`length`	string	Target length: `short` (500 words), `medium` (1000), `long` (2000)
`model`	string	AI model to use (defaults to Claude Sonnet 4)
`includeRag`	boolean	Whether to pull RAG context for the topic

When includeRag is true, the endpoint queries the RAG pipeline for content related to the topic and includes the top-5 relevant chunks as additional context. This helps the model reference existing Arcturus-Prime content and maintain consistency with previously published material.

The response streams via SSE, with each token delivered as an event. The final event includes the complete generated content and suggested frontmatter:

{
  "content": "# Setting up WireGuard VPN on Proxmox LXC...",
  "frontmatter": {
    "title": "Setting up WireGuard VPN on Proxmox LXC Containers",
    "description": "A step-by-step guide to running WireGuard in an LXC container on Proxmox VE for secure remote access to your homelab",
    "tags": ["wireguard", "vpn", "proxmox", "lxc", "networking"],
    "category": "infrastructure",
    "draft": true
  }
}

Generated content always starts as a draft. It must pass through voice checking and editorial review before being published.

Image Generation

/api/admin/generate-image (POST)

Generates images for content headers, diagrams, and illustrations. Request body:

{
  "prompt": "Minimalist technical diagram of a homelab network with three servers connected to a central switch, dark background with cyan accent lines, clean vector style",
  "aspect": "16:9",
  "style": "technical"
}

The endpoint routes through OpenRouter to available image generation models. Generated images are downloaded and stored in public/images/generated/ with a hash-based filename. The response includes the local path for embedding in content:

{
  "path": "/images/generated/wireguard-diagram-a3f7b2.png",
  "width": 1920,
  "height": 1080,
  "prompt": "...",
  "model": "..."
}

Image generation is used by the content lab for creating header images during the drafting process. The generated path can be directly inserted into frontmatter as the heroImage field.

AI Writing Coach

/api/admin/ai-coach (POST)

The AI writing coach provides structured feedback on content quality. Submit any piece of content and receive detailed assessment across five dimensions:

{
  "content": "# My Blog Post\n\nContent here...",
  "type": "post",
  "focusAreas": ["voice", "technical"]
}

The response provides scores and actionable feedback:

{
  "overall": 7.8,
  "dimensions": {
    "clarity": { "score": 8, "feedback": "Clear explanations throughout. The WireGuard configuration section could benefit from a brief explanation of what each config key does." },
    "technical": { "score": 9, "feedback": "Accurate technical details. The MTU recommendation of 1420 is correct for WireGuard over IPv4." },
    "voice": { "score": 7, "feedback": "Good overall tone but the introduction feels more formal than typical Arcturus-Prime posts. Consider opening with the specific problem you were solving." },
    "structure": { "score": 8, "feedback": "Logical flow from motivation through implementation. The troubleshooting section would benefit from being broken into subsections by symptom." },
    "engagement": { "score": 7, "feedback": "Solid technical content but could use more personal anecdotes about your specific setup. Mentioning Tarn-Host by name and your actual network config would add authenticity." }
  },
  "suggestions": [
    "Open with the specific problem: wanting to access Tarn-Host at 192.168.20.100 from outside the network",
    "Add your actual WireGuard interface config as an example rather than generic placeholders",
    "Include a screenshot of the Proxmox LXC resource allocation panel"
  ]
}

The focusAreas parameter tells the coach to weight certain dimensions more heavily in its analysis. Available focus areas: voice, technical, structure, engagement, clarity, seo.

AI Prompts Library

/api/admin/ai-prompts (GET/POST)

A persistent library of prompt templates used across all AI features.

GET /api/admin/ai-prompts returns all stored prompts:

[
  {
    "id": "blog-post-intro",
    "name": "Blog Post Introduction",
    "category": "content-generation",
    "template": "Write an engaging introduction for a blog post about {{topic}}. The introduction should state the problem being solved, why it matters for homelab enthusiasts, and what the reader will learn. Match the voice profile provided in the system prompt.",
    "variables": ["topic"],
    "model": "claude-sonnet-4-20250514",
    "createdAt": "2026-01-15T10:00:00Z"
  }
]

POST /api/admin/ai-prompts creates or updates a prompt template. Templates support {{variable}} placeholders that are filled at invocation time.

The prompts library is used by the content lab, chat interface, and the Argonaut blog writer to maintain consistent prompt engineering across all AI interactions.

Fact Checking

/api/admin/fact-check (POST)

The fact-checking endpoint verifies claims made in blog content. It is particularly important for technical posts where incorrect commands, wrong port numbers, or inaccurate configuration examples could cause real problems for readers.

Request:

{
  "content": "# Setting up WireGuard VPN...",
  "checkTypes": ["technical", "commands", "links"]
}

Response:

{
  "claims": [
    {
      "text": "WireGuard uses UDP port 51820 by default",
      "status": "verified",
      "confidence": 0.98,
      "source": "WireGuard official documentation"
    },
    {
      "text": "The MTU should be set to 1500 for WireGuard tunnels",
      "status": "disputed",
      "confidence": 0.85,
      "correction": "The recommended MTU for WireGuard is 1420 (IPv4) or 1400 (IPv6) to account for WireGuard overhead",
      "source": "WireGuard whitepaper, section 6.1"
    }
  ],
  "commands": [
    {
      "command": "wg-quick up wg0",
      "status": "verified",
      "note": "Standard WireGuard interface activation command"
    }
  ],
  "summary": {
    "totalClaims": 14,
    "verified": 12,
    "disputed": 1,
    "unverified": 1
  }
}

The checkTypes parameter controls what is verified:

technical — factual claims about technology, protocols, and configurations
commands — shell commands and their expected behavior
links — external URLs referenced in the content (checks for 404s)
dates — date claims and version numbers

Voice Authenticity Scoring

/api/admin/voice-check (POST)

Scores content against the voice profile for authenticity. This is the gatekeeper that determines whether generated content sounds like it belongs on Arcturus-Prime.

The endpoint returns a score from 0-100:

90-100: Indistinguishable from manually written content. Ready for review.
70-89: Close to the target voice with minor deviations. Acceptable with light editing.
50-69: Noticeable voice mismatch. Requires significant revision or regeneration.
Below 50: Does not match the Arcturus-Prime voice. Should be regenerated with adjusted prompts.

The response includes per-dimension scores and specific line-level annotations showing where the voice deviates.

Content Sanitization

/api/admin/sanitize (POST)

The sanitization endpoint applies the Galactic Identity System (GIS) rules to content before publication. GIS is Arcturus-Prime’s content safety framework that ensures no sensitive personal information, API keys, internal network details beyond what is intentionally shared, or other private data appears in published content.

Sanitization checks include:

API key patterns (OpenRouter, Anthropic, Google, Twilio, etc.)
Password and secret patterns
Email addresses that are not the public contact address
Internal IP addresses that are not part of the documented homelab setup
SSH keys and certificate content
Database connection strings with credentials
Environment variable values that appear to be secrets

Content that triggers sanitization rules is flagged with inline markers showing what would be redacted, allowing the author to make intentional decisions about what to keep and what to remove.

Full Publish Workflow

The complete content lifecycle from idea to published post:

Transcript/Idea — raw input arrives via the pipeline page (transcript upload), content lab (topic and outline), or direct editor (manual writing)
Draft Generation — if using AI assistance, the content generation endpoint creates a draft matching the voice profile. The draft is saved with draft: true in frontmatter.
Voice Check — the draft is scored against the voice profile via /api/admin/voice-check. Content scoring below 70 is flagged for revision. The score and feedback are attached to the draft as metadata.
Fact Check — technical content is verified via /api/admin/fact-check. Disputed claims are highlighted for the author to address. The fact check report is attached to the draft.
AI Coach — optionally, the draft is submitted to /api/admin/ai-coach for structural and engagement feedback. Suggestions are presented as a checklist.
Sanitization — the draft passes through /api/admin/sanitize to catch any sensitive data that should not be published.
Editorial Review — the draft enters the review queue at /admin/review. A human reviewer reads the content, checks the voice score and fact check reports, and either approves the content (reviewed: true) or sends it back for revision (needsWork: true with notes).
Publish — approved content has draft: false set in frontmatter. On the next build, the content appears on the public site. The Gitea webhook triggers a build that deploys the updated site to Cloudflare Pages.

This pipeline ensures every piece of published content has been checked for voice consistency, factual accuracy, and sensitive data, whether it was written by a human, generated by AI, or produced through a combination of both.