05:06 - The NAS is acting strange. Slow writes. Random I/O errors. Time to check the filesystem.
sudo btrfs check /dev/sda1
Waited. And waited. Then:
ERROR: could not find extent tree
Thatâs bad. Really bad.
05:30 - The extent tree is how Btrfs tracks which blocks are in use. Without it, the filesystem doesnât know whatâs data and whatâs free space. Itâs like losing the index to a libraryâthe books are still there, but good luck finding anything.
First rule: donât make it worse.
sudo mount -o remount,ro /dev/sda1
Read-only mode. No writes until I understand what happened.
06:15 - Tried the backup superblock:
sudo mount -o ro,usebackup,recovery /dev/sda1 /mnt/recovery
It mounted. I can see my files. But I donât trust this filesystem for anything permanent anymore.
07:00 - Started copying critical data to another drive:
rsync -avP /mnt/recovery/important/ /mnt/backup/important/
The extent tree corruption means even âsuccessfulâ reads might be returning garbage. Verify everything copied correctly.
08:30 - Recovery options, in order of increasing desperation:
- btrfs check âreadonly - Just look, donât touch
- btrfs rescue super-recover - Try alternate superblocks
- btrfs check ârepair - Attempt fixes (DANGEROUS)
- btrfs restore - Extract files to new location (last resort)
I went with option 4. The filesystem was too corrupted to trust repairs.
sudo btrfs restore /dev/sda1 /mnt/backup/
This bypasses the filesystemâs internal tracking and just reads blocks that look like data. Slow, but safer than trusting a corrupted extent tree.
12:00 - Data recovered. 2.3TB across 1.2 million files. Spot-checked critical directoriesâeverything looks intact.
The Post-Mortem:
Checked dmesg from before the corruption:
BTRFS warning: csum failed root 5 ino 257 off 4096
BTRFS error: bdev /dev/sda1 errs: wr 0, rd 47, flush 0, corrupt 12, gen 0
Read errors and corruption warnings Iâd been ignoring for weeks. The drive was dying. The extent tree corruption was just the final symptom.
The Lessons:
-
Donât ignore filesystem warnings.
csum failedmeans checksums donât match. Thatâs corruption, not a suggestion. -
Monitor drive health.
smartctl -a /dev/sdawould have shown the degradation before it became catastrophic. -
Btrfs extent tree loss is recoverable if you act fast and donât write anything. Read-only mount, backup data, figure out root cause.
-
btrfs restoreis underrated. When the filesystem metadata is corrupted, it can still extract data by scanning for file structures directly.
14:00 - New drive ordered. Old drive marked as âdo not trust.â Data safe.
The NAS will be rebuilt this weekend. This time with proper SMART monitoring.
The extent tree warnings started three weeks ago. I saw them in dmesg. I thought âprobably fine.â It was not fine.