When Your Docs Page Becomes a Credential Dump

When Your Docs Page Becomes a Credential Dump

I ran a comprehensive security audit on ArgoBox — my homelab management dashboard. 87,478 files scanned across 15 categories. 14 parallel agents crawling through TypeScript, Astro templates, API routes, Docker configs, and git history.

The worst finding wasn't in the code. It was in the documentation.


The Docs Page Problem

ArgoBox has a docs section. One page is called environment-variables.md — an admin reference explaining what environment variables the application uses. API keys for cloud services. Database credentials. Infrastructure tokens.

Standard documentation pattern. Every project has a page like this. Normally it looks something like:

GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
CLOUDFLARE_API_KEY=your-key-here

Mine looked like:

GITHUB_TOKEN=ghp_A3k9... (actual token)
CLOUDFLARE_API_KEY=c8f2... (actual key)

Real values. 30+ of them. GitHub tokens. Cloudflare API keys. Credentials for 6 different AI providers. Twilio. Resend. Infrastructure tokens for internal services.

All committed to git. In a markdown file. In the content directory. Rendered as a documentation page anyone with repo access could read.

I don't know exactly when I pasted real values instead of placeholders. Probably during a late-night setup session where I was copying from my terminal and forgot to sanitize before committing. One careless paste, 30+ leaked secrets.


The Scale of the Problem

The docs page was the primary offender, but not the only one. The full audit found credentials scattered across:

  • Configuration files with real values instead of references to environment variables
  • Debug endpoints that expose key metadata (not the keys themselves, but lengths and prefixes — enough to narrow a brute force)
  • Data files that should never have been committed

48 total API keys, tokens, and passwords inventoried across the codebase. Some duplicated across files. Some only appearing in git history, removed from current files but still recoverable with git log -p.


Why This Matters (Even for a Homelab)

"It's just a homelab project. The repo is private." I've told myself this. You've probably told yourself something similar.

But private repos get cloned. They get pushed to mirrors. They get shared with collaborators. They get forked. And git history is permanent — every credential ever committed is recoverable unless you rewrite history.

My ArgoBox repo has 2,052 commits. Some of those credentials have been sitting in the history for months. Even after I remove them from current files, they're a git log --all -S "ghp_" away from being found.

The real fix requires history rewriting. git filter-repo or BFG Repo-Cleaner. That's a destructive operation that invalidates every clone and fork. Not something you do casually.


XSS: The Sanitizer That Doesn't

Two XSS vulnerabilities. Different mechanisms, same root cause: trusting that input sanitization is working when it isn't.

The Email Regex Sanitizer

There's an email rendering component that accepts HTML content and runs it through a regex-based sanitizer before display. The regex strips script tags and event handlers.

Regex-based HTML sanitization is a known-bad pattern. The number of ways to bypass regex tag stripping is enormous: encoding tricks, nested tags, malformed attributes, SVG containers. If your sanitization strategy involves a regular expression and the word "HTML," it's probably bypassable.

The fix: use a proper DOM-based sanitizer like DOMPurify that parses the HTML into a tree and walks the nodes. Regex can't do this because HTML isn't a regular language. This has been known since approximately forever.

Markdown Preview

Markdown gets converted to HTML and rendered without sanitization. No DOMPurify. No allowlist. If someone crafts a markdown payload with embedded HTML, it renders directly.

Both vulnerabilities are behind authentication — you need to be a logged-in user to reach them. But "behind auth" is not the same as "safe." Any authenticated user (including a compromised account) could exploit them.


API Security: Six Ways In

The audit flagged 6 distinct API security issues:

1. Debug Endpoint Information Leak. A debug/diagnostic endpoint returns metadata about configured API keys — their lengths and whether they're set. Not the keys themselves. But knowing that OPENAI_API_KEY is 51 characters long and starts with "sk-" narrows the search space.

2. Auth Inconsistency on Command Endpoint. POST requires authentication. GET does not. The GET handler returns data that should be behind the same auth check. Classic oversight when adding methods to an existing route.

3. Path Traversal in File API. The file management API validates paths but not thoroughly enough. Relative path components (../) can escape the intended directory boundary. Not a remote exploit — requires authenticated access — but allows reading files outside the sandboxed directory.

4. SSRF in API Proxy. The proxy endpoint forwards requests to backend services. Without proper URL validation, it can be pointed at internal services, metadata endpoints, or loopback addresses. Server-side request forgery through a feature that's supposed to be helpful.

5. Demo Mode Auth Bypass. When demo mode is active, authentication checks are relaxed. If demo mode is accidentally enabled in production — or if the toggle is accessible to users — auth protections disappear.

6. Unsigned JWT Extraction. At least one code path reads JWT tokens without verifying the signature. The token's claims are trusted without proving they haven't been modified. This means a crafted JWT with elevated privileges could be accepted.


Docker: Root Is Not a User Strategy

Three Docker containers — lab-engine, audiobook-engine, argonaut — run as root. No USER directive in their Dockerfiles.

Running as root inside a container means any application vulnerability has maximum privilege within the container. File access, network operations, process management — all unrestricted. Modern Docker containment makes host breakout difficult, but "difficult" and "impossible" are different words.

The fix is simple: add a USER directive. Create a non-root user in the Dockerfile, switch to it. Most applications don't need root. Most applications shouldn't have root.


PII That Defeats Its Own Purpose

Two files that should never be in version control:

known_devices.json — MAC addresses and personal device names. My network topology in a JSON file. Device names that follow my naming convention.

identity_map.json — maps real names to sanitized names. This is the anonymization mapping. The file that's supposed to protect identity contains the exact mapping needed to reverse the anonymization. In git.

The identity map is particularly ironic. Sanitized names exist so that if the data leaks, real identities are protected. But the mapping file is in the same repository. Anyone who gets the sanitized data can also get the mapping. The anonymization is decorative.


The Numbers Behind the Audit

The raw scale of what got scanned:

  • 87,478 total files (45,489 excluding node_modules and dependencies)
  • 1,824 TypeScript source files
  • 892 Astro template files
  • 372 API endpoints audited individually
  • 38 packages + 66 modules evaluated for health
  • 2,052 git commits analyzed for history issues
  • 15 audit categories, each with its own detailed report

Beyond the critical findings, the audit turned up a lot of accumulated debris. 67 root-level markdown files where there should be about 17. 441 personal journal backup files tracked in git. Dual lock files — both package-lock.json and pnpm-lock.yaml — from migrating package managers without cleaning up. 300 MB of git history bloat from an embeddings-index.json file that got committed, modified, and recommitted over time. 22 stale remote branches that haven't been touched in months. 86 console.log statements in production code. 301 as any type assertions.

None of these are security vulnerabilities. They're the kind of mess that accumulates when you're building fast and not pruning. But they make the codebase harder to audit, harder to onboard into, and harder to trust.


The Good News

Not everything was bad. The auth system graded A-minus — multi-layer defense with Cloudflare Access, JWT validation, and KV role checks. The test suite has 65+ files with 2,713+ test cases and zero skipped. These are signs of a codebase that someone cares about.

The security issues are hygiene problems, not architecture problems. The system's design is sound. The implementation just has credentials in the wrong places, sanitizers that don't sanitize, and auth checks with gaps.

The difference between "bad architecture" and "bad hygiene" matters. Bad architecture means rethinking the system. Bad hygiene means doing the boring work: rotating keys, adding DOMPurify, fixing one auth check, adding one USER directive. Tedious. Mechanical. Each fix is 10-30 minutes. The hard part is doing 15 of them without cutting corners.


The Fix List

15 items. Roughly prioritized by risk.

  1. Rotate all 30+ leaked credentials across every provider (GitHub, Cloudflare, Twilio, Resend, 6 AI providers, infrastructure tokens)
  2. Replace real values in environment-variables.md with descriptive placeholders
  3. Fix both XSS vulnerabilities (DOMPurify for email renderer, DOMPurify for markdown preview)
  4. Add auth to the GET handler on the command endpoint
  5. Validate paths properly in the file API
  6. Restrict the proxy endpoint's allowed targets
  7. Verify JWT signatures in all code paths
  8. Add USER directives to 3 Dockerfiles
  9. Remove known_devices.json and identity_map.json from tracking
  10. Purge secrets from git history with filter-repo
  11. Fix CORS wildcard configurations
  12. Reduce 67 root-level markdown files to ~17
  13. Prune 22 stale remote branches
  14. Remove 441 journal backup files from the repo
  15. Address 300 MB of git bloat from embeddings-index.json history

The credential rotation alone is a full session. Each key needs to be revoked at the provider, regenerated, stored in the credential vault, and updated in whatever deployment configuration references it. 30+ keys across 10+ providers.


What I Chose Not to Do (Yet)

I didn't immediately revoke all credentials. The reasoning: these secrets have been in git for a while. They're in git history regardless of what I do to current files. A few more days of planned, methodical rotation is safer than emergency revocation that might break production services at 11 PM on a Thursday.

The credential vault lives at a private path with restricted access. 48 credentials inventoried. 27-item rotation checklist created with provider-specific instructions. The plan exists. Execution starts next session.


What a Secret Scanner Would Have Caught

Every single one of these findings would have been caught by a pre-commit secret scanner. GitHub's secret scanning. GitLeaks. TruffleHog. detect-secrets. Any of them.

I didn't have one configured. The repo has a test suite with 2,713 test cases, linting, type checking — and zero secret scanning. The irony is that secret scanning is probably the easiest security tool to add. A pre-commit hook. A CI step. 5 minutes of configuration.

The gap between "I know I should do this" and "I actually did this" is where 30+ credentials leaked into git. Not because the tooling doesn't exist. Because I didn't install it.

For anyone reading this who also doesn't have secret scanning configured: go add it. Right now. Before your next commit. It takes less time than reading this section took.


The Lesson

Documentation pages are code. Treat them like code. Review them like code. Put them through the same secret-scanning pipeline as your source files.

Better yet, never put real values in documentation files at all. Use YOUR_KEY_HERE placeholders. Link to your secrets manager. Reference environment variable names, not values. If you're copying a real token into a file that will be committed, you're one distracted moment away from shipping secrets.

I was that distracted moment, probably at 2 AM, probably during a setup session, probably thinking "I'll fix this later."

87,478 files audited. 15 categories scored. And the biggest finding was a markdown file that was doing its job backwards.