The RAID That Refused to Rebuild
Date: December 5-6, 2025 Duration: Two days Messages: 86 (across Claude sessions) Issue: RAID array wouldn’t accept replacement drive Result: Recovered, but not how I expected
The Alert
Storage Manager: “Storage Pool 1 is degraded.”
One of the four drives in the Synology had failed. Normal enough - drives die. That’s why we have RAID.
Pulled the dead drive. Inserted a new one. Waited for the rebuild.
And waited.
And waited.
The Symptom
Storage Manager showed the new drive. It was detected. It was healthy. But the “Repair” option was grayed out.
The array refused to accept its replacement.
First Hypothesis: Wrong Drive Size
The original array was 4x4TB drives. The replacement was… also 4TB. Same model family, even.
# SSH to NAS
cat /proc/partitions
New drive showed fewer sectors. A few GB smaller. Close enough for most purposes. Not close enough for Synology’s RAID implementation.
The array wanted exactly the same size or larger. This drive, despite being “4TB,” had slightly fewer usable sectors.
Second Hypothesis: Bad Sectors
Maybe the new drive had issues.
smartctl -a /dev/sdd
Clean. Zero reallocated sectors. Zero pending sectors. The drive was healthy.
Not the problem.
Third Hypothesis: Partition Table Corruption
The failed drive might have left garbage in the partition scheme.
cat /proc/mdstat
md2 : active raid5 sdc5[2] sdb5[1] sda5[0]
11708923392 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[UUU_] - three drives active, one missing. But no fourth drive was trying to join.
The array knew a drive was missing. It just didn’t want the replacement.
The Actual Problem
Deep in DSM’s logs:
Volume1: Drive 4 partition mismatch - expected 3907018584 sectors, got 3907018240
The replacement drive was 344 sectors smaller than the original. About 176KB.
176KB difference on a 4TB drive. 0.000004% smaller.
And that was enough to fail the rebuild.
The Solution Options
Option 1: Find an identical drive
Hunt down the exact same model with the exact same sector count. Possible, but annoying.
Option 2: Shrink the existing partitions
Theoretically possible. Practically terrifying on a live RAID.
Option 3: Use a larger drive
A 6TB drive would definitely have enough sectors. Wasteful, but works.
What I Actually Did
Checked my spare drives. Found a 5TB that I’d forgotten about.
# Check sector count
smartctl -i /dev/sdd | grep "User Capacity"
5TB = 9,767,541,168 sectors. Way more than needed.
Swapped in the 5TB. Storage Manager immediately offered the Repair option.
Repairing Storage Pool 1...
Time remaining: 18 hours
The extra 1TB would go unused (RAID 5 matches the smallest drive), but the array accepted it.
The 18-Hour Wait
Rebuild started at 11 PM. Finished the next afternoon.
During rebuild:
- Read/write performance dropped to maybe 30% of normal
- System stayed accessible
- Didn’t lose any data
The whole time, I had one functional redundancy removed. If another drive failed during rebuild, total loss.
Post-Rebuild Verification
cat /proc/mdstat
md2 : active raid5 sdd5[4] sdc5[2] sdb5[1] sda5[0]
11708923392 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[UUUU] - all four drives active. Array healthy.
Scrub to verify:
echo check > /sys/block/md2/md/sync_action
Scrub completed clean. No mismatches.
Lessons Learned
Sector count matters. “4TB” isn’t a precise specification. Different models, different manufacturers, even different batches can have different sector counts. Always use the same size or larger for RAID replacements.
Check before you buy. Look up the exact sector count of your existing drives. Match or exceed.
Keep a larger spare. My 5TB spare saved the day. The cost difference between 4TB and 5TB is nothing compared to the convenience of “definitely fits.”
RAID is not backup. During the 18-hour rebuild, I had no redundancy. If I’d lost another drive, the array would have been gone. The important data was also on a different NAS. RAID protects against drive failure, not against data loss.
The Hardware Lesson
Drive manufacturers advertise capacity, not sectors. Two “4TB” drives can differ by millions of sectors.
For RAID:
- Use drives from the same batch when possible
- When replacing, go larger
- Never assume “same capacity” means “same size”
176 kilobytes. That’s all it took to fail a 16 terabyte array rebuild.