Sleeping at Night: Automated KVM Backups with Bash

Sleep is Important

I have 10 VMs running on my workstation. Home Assistant. The Build Swarm orchestrator. A testing sandbox. If the NVMe drive died today, they would be gone.

I spent a Saturday writing a script to fix that anxiety.

The Strategy

Backing up a running KVM (libvirt) VM involves two things:

  1. The Definition: The XML configuration (CPU, RAM, Network map).
  2. The Disk: The .qcow2 image.

Crucially, you cannot just copy the disk while the VM is writing to it. You get corruption. So we have two choices:

  1. Snapshot Mode: Use virsh blockcommit (Complex, efficient).
  2. The Sledgehammer: Shut down, copy, start up (Simple, disruptive).

Since these are homelab VMs, I chose the Sledgehammer (scheduled for 3 AM).

The Script

#!/bin/bash
# /usr/local/bin/vm-backup.sh

BACKUP_ROOT="/backups/vms"
DATE=$(date +%Y%m%d)
TARGET_DIR="$BACKUP_ROOT/$DATE"

mkdir -p "$TARGET_DIR"

# Get list of running VMs
VMS=$(virsh list --name)

for VM in $VMS; do
    echo "Processing $VM..."
    
    # 1. Dump XML Config
    virsh dumpxml "$VM" > "$TARGET_DIR/$VM.xml"
    
    # 2. Get Disk Path
    DISK_PATH=$(virsh domblklist "$VM" --details | grep file | awk '{print $4}')
    DISK_NAME=$(basename "$DISK_PATH")
    
    # 3. Shutdown
    echo "Stopping $VM..."
    virsh shutdown "$VM"
    
    # Wait for shutdown (timeout 60s)
    TIMEOUT=0
    while virsh list --name | grep -q "^$VM$"; do
        sleep 5
        let TIMEOUT=TIMEOUT+5
        if [ $TIMEOUT -ge 60 ]; then
            echo "Timeout waiting for shutdown. Forcing..."
            virsh destroy "$VM"
            break
        fi
    done
    
    # 4. Copy Disk
    echo "Backing up $DISK_NAME..."
    # Use sparse copy to save space!
    cp --sparse=always "$DISK_PATH" "$TARGET_DIR/$DISK_NAME"
    
    # 5. Start
    echo "Starting $VM..."
    virsh start "$VM"
done

# Cleanup old backups (Keep 7 days)
find "$BACKUP_ROOT" -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

The โ€œSparseโ€ Trick

The flag cp --sparse=always is magic. My Windows VM has a 100GB allocated disk. But it only uses 20GB of space. A normal copy creates a 100GB file. A sparse copy creates a 100GB logical file that only takes up 20GB on disk.

Automation

Cron. Simple, reliable, works everywhere.

# /etc/cron.d/vm-backup
0 3 * * * root /usr/local/bin/vm-backup.sh >> /var/log/vm-backup.log 2>&1

Make sure cronie is running:

rc-update add cronie default
rc-service cronie start

Result

Every morning, I wake up to a folder full of .qcow2 images. Iโ€™ve restored from them twice. It works perfectly. Cost: $0. Peace of mind: Infinite.