Sleeping at Night: Automated KVM Backups with Bash

Sleep is Important

I have 10 VMs running on my workstation. Home Assistant. The Build Swarm orchestrator. A testing sandbox. If the NVMe drive died today, they would be gone.

I spent a Saturday writing a script to fix that anxiety.

The Strategy

Backing up a running KVM (libvirt) VM involves two things:

  1. The Definition: The XML configuration (CPU, RAM, Network map).
  2. The Disk: The .qcow2 image.

Crucially, you cannot just copy the disk while the VM is writing to it. You get corruption. So we have two choices:

  1. Snapshot Mode: Use virsh blockcommit (Complex, efficient).
  2. The Sledgehammer: Shut down, copy, start up (Simple, disruptive).

Since these are homelab VMs, I chose the Sledgehammer (scheduled for 3 AM).

The Script

#!/bin/bash
# /usr/local/bin/vm-backup.sh

BACKUP_ROOT="/backups/vms"
DATE=$(date +%Y%m%d)
TARGET_DIR="$BACKUP_ROOT/$DATE"

mkdir -p "$TARGET_DIR"

# Get list of running VMs
VMS=$(virsh list --name)

for VM in $VMS; do
    echo "Processing $VM..."
    
    # 1. Dump XML Config
    virsh dumpxml "$VM" > "$TARGET_DIR/$VM.xml"
    
    # 2. Get Disk Path
    DISK_PATH=$(virsh domblklist "$VM" --details | grep file | awk '{print $4}')
    DISK_NAME=$(basename "$DISK_PATH")
    
    # 3. Shutdown
    echo "Stopping $VM..."
    virsh shutdown "$VM"
    
    # Wait for shutdown (timeout 60s)
    TIMEOUT=0
    while virsh list --name | grep -q "^$VM$"; do
        sleep 5
        let TIMEOUT=TIMEOUT+5
        if [ $TIMEOUT -ge 60 ]; then
            echo "Timeout waiting for shutdown. Forcing..."
            virsh destroy "$VM"
            break
        fi
    done
    
    # 4. Copy Disk
    echo "Backing up $DISK_NAME..."
    # Use sparse copy to save space!
    cp --sparse=always "$DISK_PATH" "$TARGET_DIR/$DISK_NAME"
    
    # 5. Start
    echo "Starting $VM..."
    virsh start "$VM"
done

# Cleanup old backups (Keep 7 days)
find "$BACKUP_ROOT" -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

The “Sparse” Trick

The flag cp --sparse=always is magic. My Windows VM has a 100GB allocated disk. But it only uses 20GB of space. A normal copy creates a 100GB file. A sparse copy creates a 100GB logical file that only takes up 20GB on disk.

Automation

I added a systemd timer (because cron is so 2010).

/etc/systemd/system/vm-backup.service:

[Unit]
Description=VM Backup Script

[Service]
Type=oneshot
ExecStart=/usr/local/bin/vm-backup.sh

/etc/systemd/system/vm-backup.timer:

[Unit]
Description=Run VM Backup Daily at 3 AM

[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true

[Install]
WantedBy=timers.target

Result

Every morning, I wake up to a folder full of .qcow2 images. I’ve restored from them twice. It works perfectly. Cost: $0. Peace of mind: Infinite.