user@argobox:~/journal/2025-12-29-the-raid-that-almost-ate-christmas
$ cat entry.md

The RAID That Almost Ate Christmas

○ NOT REVIEWED

The RAID That Almost Ate Christmas

Date: December 29, 2025 Duration: Multiple sessions across 4 days Issue: Synology NAS failure, data recovery needed Root Cause: Btrfs corruption + USB transport instability


The Setup

Dad’s Synology NAS died. Four 6TB HGST drives in SHR (basically RAID5). Years of family photos, documents, and media. The NAS wouldn’t boot, and every attempt to access the volume failed.

I grabbed the drives and brought them to my Gentoo workstation. Time for some data archaeology.


The Layer Cake

Synology doesn’t just put files on a disk. Oh no. That would be too easy.

Physical disks (/dev/sdX)
└── Partition sdX5 (linux_raid_member)
    └── mdadm RAID5 (/dev/md127)
        └── LVM Physical Volume
            └── Volume Group: vg1
                └── Logical Volume: vg1-volume_1
                    └── Filesystem: Btrfs

Five layers. Each one had to work before I could see a single file.


Phase 1: Disk Discovery

Connected all four drives via USB docking station. The system saw them immediately.

lsblk -f

The sdX5 partitions showed as linux_raid_member with label Redcone:2. Good sign — the RAID metadata was intact.


Phase 2: RAID Assembly

First problem: /proc/mdstat returned “No such file or directory.”

The kernel module wasn’t loaded.

sudo modprobe md_mod

Now to assemble the RAID. The key word here is read-only. Never, ever touch source data during recovery.

sudo mdadm --assemble --readonly /dev/md127 \
  /dev/sdb5 /dev/sdc5 /dev/sdd5 /dev/sde5
cat /proc/mdstat
# md127 : active (read-only) raid5 [UUUU]

All four drives present. Array intact.


Phase 3: LVM Activation

sudo pvscan
sudo vgscan
sudo vgchange -ay

Got a warning about “old PV header” — typical with Synology’s modified LVM. Ignored it.

lsblk | grep vg1-volume_1

The logical volume appeared. Three layers down, two to go.


Phase 4: The Btrfs Problem

This is where things got ugly.

Tried mounting:

sudo mount -o ro /dev/mapper/vg1-volume_1 /mnt/recovery

Failed. Btrfs complained about corrupted metadata.

The temptation to run btrfs check --repair was real. But that command modifies data. On a recovery disk. With irreplaceable family photos.

Nope.


Phase 5: The Destination Disaster

Borrowed a 24TB USB HDD for the recovery destination. Formatted ext4, mounted at /mnt/recovery_dest.

Started the copy. Watched it progress. Then:

ls /mnt/recovery_dest
# Input/output error

The destination drive had disconnected. Mid-copy. Without warning.

Checked dmesg:

error -71
USB disconnect
uas_eh_abort_handler
EXT4 journal aborted
Remounting filesystem read-only

USB docks under sustained write load. The cable was fine. The dock was fine. But together, under continuous data transfer, they gave up.

This wasn’t ext4’s fault. This was transport failure.


Phase 6: UFS Explorer

Linux tools gave me block-level access. But for file-level recovery from corrupted Btrfs? I needed something stronger.

UFS Explorer understands the full stack: MD RAID → LVM → Btrfs. It can reconstruct directory trees without mounting.

Opened UFS, pointed it at the assembled RAID. It found the Btrfs volume. Started browsing directories.

Documents: visible. Photos: visible. All those family memories: right there.


Phase 7: The Copy (Take Two)

Switched to a USB NVMe enclosure with a Samsung 970 EVO Plus. Solid state, no moving parts, stable under sustained writes.

Started the priority copy:

  1. Documents folder first
  2. Family photos second
  3. Everything else after

This time, no disconnects. No errors. Just steady progress.


The Lessons

Read-only is sacred. When recovering data, you are not allowed to write to source disks. Period. Use --readonly flags everywhere.

USB docks lie. They claim to support sustained data transfer. Under recovery workloads — hundreds of GB, continuous writes — they choke. Use NVMe or internal SATA when possible.

The layer cake matters. Synology’s storage stack is: disk → partition → mdadm → LVM → Btrfs. You must work through each layer in order. Skipping steps doesn’t work.

Don’t repair before extracting. btrfs check --repair sounds helpful. It’s not. It modifies metadata. On corrupted filesystems, that can destroy your last chance at recovery.

Professional tools exist for a reason. I spent a day trying to do this with native Linux utilities. UFS Explorer solved it in an hour.


The Status

ComponentStatus
RAID assemblyWorking
LVM activationWorking
Native Btrfs mountFailed (corruption)
USB HDD destinationFailed (transport)
NVMe destinationWorking
UFS Explorer recoverySuccess

Documents and photos recovered. Dad has his data back.


The drives are still sitting on my desk. I should probably ship them back. But first, maybe I’ll run a few more verification passes. Just to be safe.