The Binpkg Cleanup Bug
Date: January 28, 2026 Duration: ~3 hours Mood: That satisfying feeling when you finally see it
The Problem
drone-Izar had accumulated 4.9GB of binary packages. drone-Tarn had 1.7GB.
These should have been cleaned up automatically after upload. That’s… literally what the cleanup function does.
Except it wasn’t.
Finding It
The cleanup code looked for this:
# Old code - looking for directories
pkg_path = os.path.join(pkgdir, cat, pkg_name)
if os.path.exists(pkg_path):
shutil.rmtree(pkg_path)
But Portage switched to flat .gpkg.tar files months ago. There are no directories.
The cleanup was checking for paths that literally don’t exist. Every. Single. Time.
# What cleanup expected:
/var/cache/binpkgs/kde-frameworks/kconfig/
# What actually exists:
/var/cache/binpkgs/kde-frameworks/kconfig-6.22.0.gpkg.tar
The Fix
Added import glob and rewrote the cleanup to actually find the files:
# New code - find the actual .gpkg.tar files
gpkg_pattern = os.path.join(pkgdir, cat, f"{pv}.gpkg.tar")
gpkg_files = glob.glob(gpkg_pattern)
for gpkg_file in gpkg_files:
os.remove(gpkg_file)
Simple. Obvious. Took three hours to notice.
The Cascade
While I was in there, I found more issues:
Staging never releasing to production. Packages were getting validated and uploaded to staging (/var/cache/binpkgs-staging) but never moved to the actual binhost. Found 33 packages stuck in staging, 0 in production.
Health check cutting off early. The deploy.sh health command was dying after section 6 because ((issues++)) returns exit code 1 when incrementing from 0. With set -e enabled, bash saw that as a failure.
Changed all of these:
((issues++))
To:
issues=$((issues + 1))
Bash arithmetic. Never trust it.
KDE Frameworks bootstrap loop. 69 packages blocked due to slot conflicts. Drones had kde-frameworks 6.20.x installed but were trying to build 6.22.x, and the new versions were sitting in staging instead of production.
The Aftermath
Moved staging to production:
for pkg in /var/cache/binpkgs-staging/*/*.gpkg.tar; do
cat=$(dirname "$pkg" | xargs basename)
mkdir -p "/var/cache/binpkgs/$cat"
mv "$pkg" "/var/cache/binpkgs/$cat/"
done
emaint binhost --fix
Cleared the blocked packages. Restarted the orchestrator.
╔════════════════════════════════════════════════════════════════╗
║ BUILD SWARM HEALTH CHECK ║
╚════════════════════════════════════════════════════════════════╝
1. BINARY VERIFICATION ✓ All match
2. SERVICE STATUS ✓ All running
3. GATEWAY CONNECTIVITY ✓ Online (4 drones, 2 orchestrators)
4. BUILD QUEUE STATUS Needed: 10 | Blocked: 0
5. GROUNDED DRONES ✓ None
╔════════════════════════════════════════════════════════════════╗
║ RESULT: ✓ ALL CHECKS PASSED ║
╚════════════════════════════════════════════════════════════════╝
What I Learned
Portage changes things. The switch to flat GPKG files happened quietly. My code was still looking for the old directory structure.
Bash arithmetic is weird. ((n++)) returns 1 when incrementing from 0. With set -e, that’s a script-killer.
Check staging. I assumed the release pipeline was working. It wasn’t. 33 packages just… sitting there.
Files Modified
| File | Changes |
|---|---|
bin/swarm-drone | Added glob import, fixed cleanup_local_binaries() |
bin/swarm-orchestrator | Added release logic, fixed pattern matching |
deploy.sh | Added health check sections 6-7, fixed arithmetic |
Tomorrow I’m monitoring a full 263-package rebuild. Let’s see what else breaks.