Complete Guide: Btrfs & Snapper for Bulletproof Linux

Complete Guide: Btrfs & Snapper for Bulletproof Linux

On October 27, 2024, I lost seven days of work to ext4 corruption. A power fluctuation during VM shutdown mangled the superblock. fsck “recovered” the filesystem by deleting half of /var/db/pkg—the Portage database. The system ran, but couldn’t update, install, or remove packages. It was a zombie.

The next day, I started over with Btrfs. I haven’t lost work since.

This guide covers everything: partition layout, subvolume architecture, mount options, Snapper automation, and Portage integration. By the end, you’ll have a system that creates automatic snapshots before every package update and can roll back a broken upgrade in under two minutes.

Part 1: Why Btrfs?

The Case Against ext4

ext4 is stable and fast. It’s also a 2008-era design with fundamental limitations:

  • No atomic updates: A power failure mid-write can corrupt metadata
  • No snapshots: Backup means copying files, not capturing system state
  • No compression: Disks store uncompressed data
  • No checksums: Silent corruption goes undetected

What Btrfs Provides

FeatureBenefit
Copy-on-Write (COW)Data is never overwritten; writes create new blocks
SnapshotsCapture system state in milliseconds
SubvolumesLogical partitions without repartitioning
Compression20-30% space savings with zstd
ChecksumsDetects silent corruption (bit rot)
Self-healingWith RAID, automatically repairs corrupt blocks

The Snapshot Superpower

Here’s what a snapshot-enabled workflow looks like:

# 1. Create snapshot before changes
snapper create --description "Before emerge nvidia-drivers"

# 2. Update package
emerge nvidia-drivers

# 3. Test system
startx  # Black screen. GPU driver broken.

# 4. Rollback (2 minutes to full recovery)
snapper rollback
reboot
# Desktop loads. Crisis averted.

Contrast with ext4: reinstall drivers, restore from backup, pray you didn’t miss anything.

Part 2: Partition Layout

Physical Disk Structure

/dev/nvme0n1 (or /dev/sda for SATA)
├── nvme0n1p1: EFI System Partition (512 MB, FAT32)
├── nvme0n1p2: Swap (8-16 GB depending on RAM)  
└── nvme0n1p3: Btrfs root (remainder of disk)

Why This Layout?

  • EFI partition: Required for UEFI boot. Must be FAT32.
  • Swap: Separate partition because swap files on Btrfs require special handling.
  • Single Btrfs partition: We use subvolumes for logical separation, not partitions.

Creating Partitions (GPT)

# Using gdisk (or fdisk for older systems)
gdisk /dev/nvme0n1

# Create GPT table
Command: o
Proceed? Y

# Partition 1: EFI (512MB)
Command: n
Partition number: 1
First sector: [Enter]
Last sector: +512M
Hex code: EF00

# Partition 2: Swap (8GB)  
Command: n
Partition number: 2
First sector: [Enter]
Last sector: +8G
Hex code: 8200

# Partition 3: Btrfs (remainder)
Command: n
Partition number: 3
First sector: [Enter]
Last sector: [Enter]
Hex code: 8300

# Write and exit
Command: w

Formatting

# EFI
mkfs.fat -F32 /dev/nvme0n1p1

# Swap
mkswap /dev/nvme0n1p2

# Btrfs with label
mkfs.btrfs -L "ArgoOS" /dev/nvme0n1p3

Part 3: Subvolume Architecture

The Concept

Subvolumes are independent filesystem trees within a Btrfs partition. They can be:

  • Mounted independently
  • Snapshotted independently
  • Excluded from parent snapshots

This is critical: we snapshot the system but not the cache.

Subvolume Layout (openSUSE-style)

# Mount the raw Btrfs partition
mount /dev/nvme0n1p3 /mnt

# Create subvolumes
btrfs subvolume create /mnt/@           # Root filesystem
btrfs subvolume create /mnt/@home       # User data
btrfs subvolume create /mnt/@snapshots  # Snapshot storage
btrfs subvolume create /mnt/@var-cache  # Package cache
btrfs subvolume create /mnt/@var-log    # System logs

# Unmount
umount /mnt

Why This Specific Structure?

SubvolumeMounted AtSnapshot BehaviorRationale
@/FrequentSystem files. Roll back breaking changes.
@home/homeSeparate scheduleUser data has different lifecycle than system.
@snapshots/.snapshotsNeverSnapshotting snapshots = infinite recursion.
@var-cache/var/cacheExcludedCache is re-downloadable. Saves 20-30GB.
@var-log/var/logExcludedLogs must persist through rollback (for debugging).

The Log Exclusion Paradox

Why exclude /var/log? Consider this scenario:

  1. You run emerge nvidia-drivers
  2. System breaks
  3. You rollback to pre-emerge snapshot
  4. Logs are restored to pre-emerge state
  5. The error messages from the failure are gone

By keeping logs in a separate subvolume, you preserve the evidence of what broke.

Part 4: Mount Configuration

Mount Options Explained

mount -o compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@ /dev/nvme0n1p3 /mnt
OptionPurpose
compress=zstd:3Transparent compression. Level 3 balances ratio vs speed.
noatimeDon’t update access timestamps. Major performance win.
discard=asyncSSD TRIM commands batched (better performance than sync).
space_cache=v2Improved free space tracking algorithm.
subvol=/@Mount specific subvolume, not root of partition.

Complete /etc/fstab

# Root filesystem (snapshotted)
/dev/nvme0n1p3  /              btrfs  defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@           0 0

# Home directory (separate snapshot schedule)
/dev/nvme0n1p3  /home          btrfs  defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@home       0 0

# Snapshots directory (never snapshot this)
/dev/nvme0n1p3  /.snapshots    btrfs  defaults,noatime,discard=async,space_cache=v2,subvol=/@snapshots                  0 0

# Cache (excluded from snapshots, no compression needed)
/dev/nvme0n1p3  /var/cache     btrfs  defaults,noatime,discard=async,space_cache=v2,subvol=/@var-cache                  0 0

# Logs (excluded from snapshots, for debugging)
/dev/nvme0n1p3  /var/log       btrfs  defaults,compress=zstd:3,noatime,discard=async,space_cache=v2,subvol=/@var-log    0 0

# EFI partition
/dev/nvme0n1p1  /boot/efi      vfat   defaults,noatime                                                                   0 2

# Swap
/dev/nvme0n1p2  none           swap   sw                                                                                  0 0

Creating Mount Points

mkdir -p /mnt/{home,.snapshots,var/cache,var/log,boot/efi}

Part 5: Snapper Setup

Installation

emerge app-backup/snapper sys-boot/grub-btrfs
  • snapper: Snapshot management (from openSUSE)
  • grub-btrfs: Adds bootable snapshots to GRUB menu

Initial Configuration

# Clear any existing config
echo 'SNAPPER_CONFIGS=""' > /etc/conf.d/snapper

# Create root configuration
snapper -c root create-config /

# Enable the config
echo 'SNAPPER_CONFIGS="root"' > /etc/conf.d/snapper

# Verify
snapper list

Configuration File

File: /etc/snapper/configs/root

# snapper configuration for root filesystem

# Subvolume to snapshot
SUBVOLUME="/"

# Filesystem type
FSTYPE="btrfs"

# Allow unprivileged users (leave empty for root-only)
ALLOW_USERS=""
ALLOW_GROUPS=""

# Create timeline snapshots
TIMELINE_CREATE="yes"

# Cleanup old snapshots by timeline
TIMELINE_CLEANUP="yes"

# Minimum age before cleanup (seconds)
TIMELINE_MIN_AGE="1800"  # 30 minutes

# Retention counts
TIMELINE_LIMIT_HOURLY="10"   # Keep last 10 hourly
TIMELINE_LIMIT_DAILY="7"     # Keep last 7 daily
TIMELINE_LIMIT_WEEKLY="0"    # Don't keep weekly  
TIMELINE_LIMIT_MONTHLY="3"   # Keep last 3 monthly
TIMELINE_LIMIT_YEARLY="0"    # Don't keep yearly

# Space limits
SPACE_LIMIT="0.5"        # Don't use more than 50% of disk
FREE_LIMIT="0.2"         # Keep at least 20% free

# Background comparison (for diff operations)
BACKGROUND_COMPARISON="yes"

# Number comparison (for cleanup by count)
NUMBER_CLEANUP="yes"
NUMBER_MIN_AGE="1800"
NUMBER_LIMIT="50"
NUMBER_LIMIT_IMPORTANT="10"

OpenRC Automation (Cron-Based)

Snapper was designed for systemd timers. On OpenRC, we use cron.

Hourly Timeline Snapshots:

cat > /etc/cron.hourly/snapper-timeline << 'EOF'
#!/bin/bash
/usr/bin/snapper -c root create --cleanup-algorithm timeline --description "timeline"
EOF
chmod +x /etc/cron.hourly/snapper-timeline

Daily Cleanup:

cat > /etc/cron.daily/snapper-cleanup << 'EOF'
#!/bin/bash
/usr/bin/snapper -c root cleanup timeline
/usr/bin/snapper -c root cleanup number
EOF
chmod +x /etc/cron.daily/snapper-cleanup

Enable cron:

emerge sys-process/cronie
rc-update add cronie default
rc-service cronie start

Part 6: Automatic Pre/Post Emerge Snapshots

This is the killer feature: every emerge command automatically creates a snapshot before and after the package operation.

Pre-Emerge Snapshot

File: /etc/portage/bashrc.d/snapper-pre.sh

#!/bin/bash
if [[ ${EBUILD_PHASE} == "setup" ]]; then
    snapper -c root create --description "Before emerge ${CATEGORY}/${PN}-${PVR}"
fi

Post-Emerge Snapshot

File: /etc/portage/bashrc.d/snapper-post.sh

#!/bin/bash
if [[ ${EBUILD_PHASE} == "postinst" ]]; then
    snapper -c root create --description "After emerge ${CATEGORY}/${PN}-${PVR}"
fi

Make executable:

chmod +x /etc/portage/bashrc.d/snapper-*.sh

Result

emerge firefox
# Snapper automatically creates:
#   Snapshot #15: "Before emerge www-client/firefox-121.0"
#   Snapshot #16: "After emerge www-client/firefox-121.0"

# To see what changed:
snapper diff 15..16

# To undo the update:
snapper undochange 15..16

Part 7: GRUB Integration

grub-btrfs adds every snapshot as a bootable menu entry. If your desktop won’t start, you can boot directly into a snapshot from GRUB.

Configuration

File: /etc/default/grub-btrfs/config

# Location of grub.cfg
GRUB_BTRFS_GRUB_DIRNAME="/boot/grub"

# Path to subvolume for snapshots
GRUB_BTRFS_SUBMENUNAME="Gentoo Linux snapshots"

# Snapshot location
GRUB_BTRFS_SNAPSHOT_DIR="/.snapshots"

# Boot prefix
GRUB_BTRFS_BORE="true"

Regenerate GRUB After Snapshots

# Add hook to regenerate GRUB when snapshots change
cat > /etc/cron.hourly/grub-btrfs << 'EOF'
#!/bin/bash
/usr/sbin/grub-mkconfig -o /boot/grub/grub.cfg
EOF
chmod +x /etc/cron.hourly/grub-btrfs

Boot Menu Result

After configuration, your GRUB menu shows:

Gentoo Linux
Gentoo Linux (advanced options)
Gentoo Linux snapshots --->
    Snapshot 16: After emerge www-client/firefox-121.0
    Snapshot 15: Before emerge www-client/firefox-121.0
    Snapshot 14: timeline (hourly)
    ...

Part 8: Common Operations

List Snapshots

snapper list

# Output:
# Type   | # | Pre # | Cleanup | Description
# single | 0 |       |         | current
# pre    | 1 |       | number  | Before emerge sys-libs/glibc
# post   | 2 | 1     | number  | After emerge sys-libs/glibc
# single | 3 |       | timeline| timeline

Create Manual Snapshot

snapper create --description "Before dangerous experiment"

Compare Snapshots

# What files changed between snapshots 5 and current?
snapper diff 5..0

# What changed between snapshots 5 and 10?
snapper diff 5..10

Undo Changes

# Undo changes made between snapshots 5 and 6
snapper undochange 5..6

# This restores files to snapshot 5 state

Full Rollback

# Boot into snapshot from GRUB menu, then:
snapper rollback

# Or specify snapshot number:
snapper rollback 15

# Reboot to apply
reboot

Delete Snapshot

snapper delete 15
# Or range:
snapper delete 10-20

Part 9: Space Management

Check Compression Effectiveness

# Install compsize
emerge sys-fs/compsize

# Check compression ratio
compsize /

# Output:
# Processed 145632 files
# Type       Original     Disk Usage   Ratio
# zstd       85.4G        52.1G        1.63

Monitor Snapshot Space

# Total snapshot space
btrfs filesystem usage /

# Individual snapshot sizes (approximate)
for snap in $(snapper list --columns number | tail -n +3); do
    size=$(btrfs subvolume show "/.snapshots/${snap}/snapshot" 2>/dev/null | grep "Usage referenced" | awk '{print $NF}')
    echo "Snapshot $snap: $size"
done

Force Cleanup When Low on Space

snapper cleanup number
snapper cleanup timeline

# If still low, delete old snapshots manually
snapper delete 1-10

Part 10: Troubleshooting

”snapper: PAM authentication failed”

Snapper tries to use dbus/polkit for authentication on OpenRC, which often fails.

Fix: Run as root or configure passwordless snapper:

# In /etc/snapper/configs/root
ALLOW_USERS="yourusername"

“Cannot create snapshot: subvolume not mounted”

The .snapshots directory must be a mounted subvolume.

# Verify it's in fstab
grep snapshots /etc/fstab

# Verify it's mounted
mount | grep snapshots

# If not mounted, mount it
mount -o subvol=/@snapshots /dev/nvme0n1p3 /.snapshots

GRUB Doesn’t Show Snapshots

# Regenerate GRUB config
grub-mkconfig -o /boot/grub/grub.cfg

# Check that grub-btrfs is finding snapshots
/etc/grub.d/41_snapshots-btrfs

# Verify grub-btrfs is executable
ls -la /etc/grub.d/41_snapshots-btrfs

System Won’t Boot After Rollback

If you rolled back to a snapshot but forgot the rollback point doesn’t include /boot:

# Boot from live USB
mount -o subvol=/@ /dev/nvme0n1p3 /mnt
mount /dev/nvme0n1p1 /mnt/boot/efi
chroot /mnt

# Reinstall kernel
emerge @module-rebuild
dracut --force
grub-mkconfig -o /boot/grub/grub.cfg

This configuration has saved me from dozens of failed upgrades. The key insight: treat your system as immutable-by-default. Every change creates a snapshot. Every mistake is reversible. Sleep well knowing that rm -rf /lib is a 2-minute recovery, not a 2-day reinstall.