Complete Guide: Btrfs & Snapper for Bulletproof Linux
On October 27, 2024, I lost seven days of work to ext4 corruption. A power fluctuation during VM shutdown mangled the superblock. fsck “recovered” the filesystem by deleting half of /var/db/pkg—the Portage database. The system ran, but couldn’t update, install, or remove packages. It was a zombie.
The next day, I started over with Btrfs. I haven’t lost work since.
This guide covers everything: partition layout, subvolume architecture, mount options, Snapper automation, and Portage integration. By the end, you’ll have a system that creates automatic snapshots before every package update and can roll back a broken upgrade in under two minutes.
Part 1: Why Btrfs?
The Case Against ext4
ext4 is stable and fast. It’s also a 2008-era design with fundamental limitations:
- No atomic updates: A power failure mid-write can corrupt metadata
- No snapshots: Backup means copying files, not capturing system state
- No compression: Disks store uncompressed data
- No checksums: Silent corruption goes undetected
What Btrfs Provides
| Feature | Benefit |
|---|---|
| Copy-on-Write (COW) | Data is never overwritten; writes create new blocks |
| Snapshots | Capture system state in milliseconds |
| Subvolumes | Logical partitions without repartitioning |
| Compression | 20-30% space savings with zstd |
| Checksums | Detects silent corruption (bit rot) |
| Self-healing | With RAID, automatically repairs corrupt blocks |
The Snapshot Superpower
Here’s what a snapshot-enabled workflow looks like:
# 1. Create snapshot before changes
snapper create --description "Before emerge nvidia-drivers"
# 2. Update package
emerge nvidia-drivers
# 3. Test system
startx # Black screen. GPU driver broken.
# 4. Rollback (2 minutes to full recovery)
snapper rollback
reboot
# Desktop loads. Crisis averted.
Contrast with ext4: reinstall drivers, restore from backup, pray you didn’t miss anything.
Part 2: Partition Layout
Physical Disk Structure
/dev/nvme0n1 (or /dev/sda for SATA)
├── nvme0n1p1: EFI System Partition (512 MB, FAT32)
├── nvme0n1p2: Swap (8-16 GB depending on RAM)
└── nvme0n1p3: Btrfs root (remainder of disk)
Why This Layout?
- EFI partition: Required for UEFI boot. Must be FAT32.
- Swap: Separate partition because swap files on Btrfs require special handling.
- Single Btrfs partition: We use subvolumes for logical separation, not partitions.
Creating Partitions (GPT)
# Using gdisk (or fdisk for older systems)
gdisk /dev/nvme0n1
# Create GPT table
Command: o
Proceed? Y
# Partition 1: EFI (512MB)
Command: n
Partition number: 1
First sector: [Enter]
Last sector: +512M
Hex code: EF00
# Partition 2: Swap (8GB)
Command: n
Partition number: 2
First sector: [Enter]
Last sector: +8G
Hex code: 8200
# Partition 3: Btrfs (remainder)
Command: n
Partition number: 3
First sector: [Enter]
Last sector: [Enter]
Hex code: 8300
# Write and exit
Command: w
Formatting
# EFI
mkfs.fat -F32 /dev/nvme0n1p1
# Swap
mkswap /dev/nvme0n1p2
# Btrfs with label
mkfs.btrfs -L "ArgoOS" /dev/nvme0n1p3
Part 3: Subvolume Architecture
The Concept
Subvolumes are independent filesystem trees within a Btrfs partition. They can be:
- Mounted independently
- Snapshotted independently
- Excluded from parent snapshots
This is critical: we snapshot the system but not the cache.
Subvolume Layout (openSUSE-style)
# Mount the raw Btrfs partition
mount /dev/nvme0n1p3 /mnt
# Create subvolumes
btrfs subvolume create /mnt/@ # Root filesystem
btrfs subvolume create /mnt/@home # User data
btrfs subvolume create /mnt/@snapshots # Snapshot storage
btrfs subvolume create /mnt/@var-cache # Package cache
btrfs subvolume create /mnt/@var-log # System logs
# Unmount
umount /mnt
Why This Specific Structure?
| Subvolume | Mounted At | Snapshot Behavior | Rationale |
|---|---|---|---|
@ | / | Frequent | System files. Roll back breaking changes. |
@home | /home | Separate schedule | User data has different lifecycle than system. |
@snapshots | /.snapshots | Never | Snapshotting snapshots = infinite recursion. |
@var-cache | /var/cache | Excluded | Cache is re-downloadable. Saves 20-30GB. |
@var-log | /var/log | Excluded | Logs must persist through rollback (for debugging). |
The Log Exclusion Paradox
Why exclude /var/log? Consider this scenario:
- You run
emerge nvidia-drivers - System breaks
- You rollback to pre-emerge snapshot
- Logs are restored to pre-emerge state
- The error messages from the failure are gone
By keeping logs in a separate subvolume, you preserve the evidence of what broke.
Part 4: Mount Configuration
Mount Options Explained
mount -o compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@ /dev/nvme0n1p3 /mnt
| Option | Purpose |
|---|---|
compress=zstd:3 | Transparent compression. Level 3 balances ratio vs speed. |
noatime | Don’t update access timestamps. Major performance win. |
discard=async | SSD TRIM commands batched (better performance than sync). |
space_cache=v2 | Improved free space tracking algorithm. |
subvol=/@ | Mount specific subvolume, not root of partition. |
Complete /etc/fstab
# Root filesystem (snapshotted)
/dev/nvme0n1p3 / btrfs defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@ 0 0
# Home directory (separate snapshot schedule)
/dev/nvme0n1p3 /home btrfs defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@home 0 0
# Snapshots directory (never snapshot this)
/dev/nvme0n1p3 /.snapshots btrfs defaults,noatime,discard=async,space_cache=v2,subvol=/@snapshots 0 0
# Cache (excluded from snapshots, no compression needed)
/dev/nvme0n1p3 /var/cache btrfs defaults,noatime,discard=async,space_cache=v2,subvol=/@var-cache 0 0
# Logs (excluded from snapshots, for debugging)
/dev/nvme0n1p3 /var/log btrfs defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@var-log 0 0
# EFI partition
/dev/nvme0n1p1 /boot/efi vfat defaults,noatime 0 2
# Swap
/dev/nvme0n1p2 none swap sw 0 0
Creating Mount Points
mkdir -p /mnt/{home,.snapshots,var/cache,var/log,boot/efi}
Part 5: Snapper Setup
Installation
emerge app-backup/snapper sys-boot/grub-btrfs
snapper: Snapshot management (from openSUSE)grub-btrfs: Adds bootable snapshots to GRUB menu
Initial Configuration
# Clear any existing config
echo 'SNAPPER_CONFIGS=""' > /etc/conf.d/snapper
# Create root configuration
snapper -c root create-config /
# Enable the config
echo 'SNAPPER_CONFIGS="root"' > /etc/conf.d/snapper
# Verify
snapper list
Configuration File
File: /etc/snapper/configs/root
# snapper configuration for root filesystem
# Subvolume to snapshot
SUBVOLUME="/"
# Filesystem type
FSTYPE="btrfs"
# Allow unprivileged users (leave empty for root-only)
ALLOW_USERS=""
ALLOW_GROUPS=""
# Create timeline snapshots
TIMELINE_CREATE="yes"
# Cleanup old snapshots by timeline
TIMELINE_CLEANUP="yes"
# Minimum age before cleanup (seconds)
TIMELINE_MIN_AGE="1800" # 30 minutes
# Retention counts
TIMELINE_LIMIT_HOURLY="10" # Keep last 10 hourly
TIMELINE_LIMIT_DAILY="7" # Keep last 7 daily
TIMELINE_LIMIT_WEEKLY="0" # Don't keep weekly
TIMELINE_LIMIT_MONTHLY="3" # Keep last 3 monthly
TIMELINE_LIMIT_YEARLY="0" # Don't keep yearly
# Space limits
SPACE_LIMIT="0.5" # Don't use more than 50% of disk
FREE_LIMIT="0.2" # Keep at least 20% free
# Background comparison (for diff operations)
BACKGROUND_COMPARISON="yes"
# Number comparison (for cleanup by count)
NUMBER_CLEANUP="yes"
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="50"
NUMBER_LIMIT_IMPORTANT="10"
OpenRC Automation (Cron-Based)
Snapper was designed for systemd timers. On OpenRC, we use cron.
Hourly Timeline Snapshots:
cat > /etc/cron.hourly/snapper-timeline << 'EOF'
#!/bin/bash
/usr/bin/snapper -c root create --cleanup-algorithm timeline --description "timeline"
EOF
chmod +x /etc/cron.hourly/snapper-timeline
Daily Cleanup:
cat > /etc/cron.daily/snapper-cleanup << 'EOF'
#!/bin/bash
/usr/bin/snapper -c root cleanup timeline
/usr/bin/snapper -c root cleanup number
EOF
chmod +x /etc/cron.daily/snapper-cleanup
Enable cron:
emerge sys-process/cronie
rc-update add cronie default
rc-service cronie start
Part 6: Automatic Pre/Post Emerge Snapshots
This is the killer feature: every emerge command automatically creates a snapshot before and after the package operation.
Pre-Emerge Snapshot
File: /etc/portage/bashrc.d/snapper-pre.sh
#!/bin/bash
if [[ ${EBUILD_PHASE} == "setup" ]]; then
snapper -c root create --description "Before emerge ${CATEGORY}/${PN}-${PVR}"
fi
Post-Emerge Snapshot
File: /etc/portage/bashrc.d/snapper-post.sh
#!/bin/bash
if [[ ${EBUILD_PHASE} == "postinst" ]]; then
snapper -c root create --description "After emerge ${CATEGORY}/${PN}-${PVR}"
fi
Make executable:
chmod +x /etc/portage/bashrc.d/snapper-*.sh
Result
emerge firefox
# Snapper automatically creates:
# Snapshot #15: "Before emerge www-client/firefox-121.0"
# Snapshot #16: "After emerge www-client/firefox-121.0"
# To see what changed:
snapper diff 15..16
# To undo the update:
snapper undochange 15..16
Part 7: GRUB Integration
grub-btrfs adds every snapshot as a bootable menu entry. If your desktop won’t start, you can boot directly into a snapshot from GRUB.
Configuration
File: /etc/default/grub-btrfs/config
# Location of grub.cfg
GRUB_BTRFS_GRUB_DIRNAME="/boot/grub"
# Path to subvolume for snapshots
GRUB_BTRFS_SUBMENUNAME="Gentoo Linux snapshots"
# Snapshot location
GRUB_BTRFS_SNAPSHOT_DIR="/.snapshots"
# Boot prefix
GRUB_BTRFS_BORE="true"
Regenerate GRUB After Snapshots
# Add hook to regenerate GRUB when snapshots change
cat > /etc/cron.hourly/grub-btrfs << 'EOF'
#!/bin/bash
/usr/sbin/grub-mkconfig -o /boot/grub/grub.cfg
EOF
chmod +x /etc/cron.hourly/grub-btrfs
Boot Menu Result
After configuration, your GRUB menu shows:
Gentoo Linux
Gentoo Linux (advanced options)
Gentoo Linux snapshots --->
Snapshot 16: After emerge www-client/firefox-121.0
Snapshot 15: Before emerge www-client/firefox-121.0
Snapshot 14: timeline (hourly)
...
Part 8: Common Operations
List Snapshots
snapper list
# Output:
# Type | # | Pre # | Cleanup | Description
# single | 0 | | | current
# pre | 1 | | number | Before emerge sys-libs/glibc
# post | 2 | 1 | number | After emerge sys-libs/glibc
# single | 3 | | timeline| timeline
Create Manual Snapshot
snapper create --description "Before dangerous experiment"
Compare Snapshots
# What files changed between snapshots 5 and current?
snapper diff 5..0
# What changed between snapshots 5 and 10?
snapper diff 5..10
Undo Changes
# Undo changes made between snapshots 5 and 6
snapper undochange 5..6
# This restores files to snapshot 5 state
Full Rollback
# Boot into snapshot from GRUB menu, then:
snapper rollback
# Or specify snapshot number:
snapper rollback 15
# Reboot to apply
reboot
Delete Snapshot
snapper delete 15
# Or range:
snapper delete 10-20
Part 9: Space Management
Check Compression Effectiveness
# Install compsize
emerge sys-fs/compsize
# Check compression ratio
compsize /
# Output:
# Processed 145632 files
# Type Original Disk Usage Ratio
# zstd 85.4G 52.1G 1.63
Monitor Snapshot Space
# Total snapshot space
btrfs filesystem usage /
# Individual snapshot sizes (approximate)
for snap in $(snapper list --columns number | tail -n +3); do
size=$(btrfs subvolume show "/.snapshots/${snap}/snapshot" 2>/dev/null | grep "Usage referenced" | awk '{print $NF}')
echo "Snapshot $snap: $size"
done
Force Cleanup When Low on Space
snapper cleanup number
snapper cleanup timeline
# If still low, delete old snapshots manually
snapper delete 1-10
Part 10: Troubleshooting
”snapper: PAM authentication failed”
Snapper tries to use dbus/polkit for authentication on OpenRC, which often fails.
Fix: Run as root or configure passwordless snapper:
# In /etc/snapper/configs/root
ALLOW_USERS="yourusername"
“Cannot create snapshot: subvolume not mounted”
The .snapshots directory must be a mounted subvolume.
# Verify it's in fstab
grep snapshots /etc/fstab
# Verify it's mounted
mount | grep snapshots
# If not mounted, mount it
mount -o subvol=/@snapshots /dev/nvme0n1p3 /.snapshots
GRUB Doesn’t Show Snapshots
# Regenerate GRUB config
grub-mkconfig -o /boot/grub/grub.cfg
# Check that grub-btrfs is finding snapshots
/etc/grub.d/41_snapshots-btrfs
# Verify grub-btrfs is executable
ls -la /etc/grub.d/41_snapshots-btrfs
System Won’t Boot After Rollback
If you rolled back to a snapshot but forgot the rollback point doesn’t include /boot:
# Boot from live USB
mount -o subvol=/@ /dev/nvme0n1p3 /mnt
mount /dev/nvme0n1p1 /mnt/boot/efi
chroot /mnt
# Reinstall kernel
emerge @module-rebuild
dracut --force
grub-mkconfig -o /boot/grub/grub.cfg
This configuration has saved me from dozens of failed upgrades. The key insight: treat your system as immutable-by-default. Every change creates a snapshot. Every mistake is reversible. Sleep well knowing that rm -rf /lib is a 2-minute recovery, not a 2-day reinstall.