Proxmox: Restore Virtual Machines via ZFS snaphots

Proxmox: Restore Virtual Machines via ZFS snaphots
Page content

Sometimes you wish you could go back in time when working with your virtual machines in your homelab. With proxmox and ZFS snapshots, this goal can easily be achieved.

Introduction

Proxmox is really good in managing a personal virtual environment. However, for managing ZFS snapshots I highly recommend the command line, because some very specific commands are required. So if you don’t like the command line, this tutorial may not be for you.

Use cases

Going back in time may be useful in the following use cases:

  • Ransomware somehow managed to encrypt your data
  • A Windows Update broke something
  • You accidentally deleted a file
    • although in this case it may be easier to mount the snaphot and look for the file instead of restoring the whole vm

Prerequisites

  • No fear of the command line
  • Proxmox installed on a ZFS volume (encryption is also supported)
  • A recent ZFS snapshot to go back to
  • Optional: zfs-auto-snapshot for auto snapshot management

Digression: Painlessly installing zfs-auto-snapshot

If you painlessly would add some features to your proxmox server (e.g. zfs-auto-snapshot), you could take a look at these setup scripts:

Gathering the required information

Find out the disk identifier

  • Go the proxmox WebInterface
  • Select the virtual machine you would like to revert
  • Go to Hardware and check the device part of the used Hard Disk (e.g. local-zfs:vm-501-disk-2,iothread=1,size=100G)
  • Note down the disk identifier (e.g. vm-501-disk-2)

List the existing snapshots

You need a snapshot to go back in time. If you configured zfs-auto-snapshot, then you should be able to go back hourly, daily, weekly, monthly, etc. If not, you can only go back to manually created snapshots - here is how to list them:

zfs list -t snapshot | grep vm-501-disk-2

Output may look like this:

rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-21-0425                        1.18G      -     65.1G  -
...
rpool/data/vm-501-disk-2@zfs-auto-snap_frequent-2023-06-25-1000                     2.42M      -     64.9G  -
rpool/data/vm-501-disk-2@zfs-auto-snap_frequent-2023-06-25-1005                     2.34M      -     64.9G  -
rpool/data/vm-501-disk-2@zfs-auto-snap_frequent-2023-06-25-1010                     2.55M      -     64.9G  -
rpool/data/vm-501-disk-2@zfs-auto-snap_frequent-2023-06-25-1015                     1.23M      -     64.9G  -
rpool/data/vm-501-disk-2@zfs-auto-snap_hourly-2023-06-25-1017                        332K      -     64.9G  -
rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-21-0425                        1.18G      -     65.1G  -

Clone the snapshot you want to go back to

Let’s say you had a Ransomware attack and you don’t exactly know the point in time, when the attack started. Rather than blindly rolling back to a snapshot you guessed (which deletes all newer snapshots of the same timeline), you should instead clone the snapshot to a new location. This way you can effortlessly try something out without destroying any existing data or snapshots.

To try the snapshot 4 days ago (rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-21-0425):

# clone an existing snapshot to a new disk (e.g. use disk-9 as marker for clones)
# I use disk 9 as convention for cloned snapshots
zfs clone rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-22-0425 rpool/data/vm-501-disk-9

# verify the cloned disk exists
zfs list | grep vm-501-disk-9
  rpool/data/vm-501-disk-0                      592K  1.42T      376K  -
  rpool/data/vm-501-disk-1                      496K  1.42T      116K  -
  rpool/data/vm-501-disk-2                      123G  1.42T     65.2G  -
  rpool/data/vm-501-disk-9                        8K  1.42T     65.2G  -
  
# verify the disk has the correct origin
zfs get origin rpool/data/vm-501-disk-9
NAME                      PROPERTY  VALUE
rpool/data/vm-501-disk-9  origin    rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-22-0425

Booting the cloned snapshot

To use the new disk, you have to trick proxmox into using the snapshot disk, instead of the original disk.

  • Go the proxmox WebInterface
  • Stop the Virtual Mashine, you are trying to restore
  • Go to the command line and edit the config file manually
    # optional but recommended: backup the original config file (in case you break something)
    cp /etc/pve/nodes/proxmox/qemu-server/501.conf /root/501.conf-2023-06-25
    
    vi /etc/pve/nodes/proxmox/qemu-server/501.conf
    # change vm-501-disk-2 (original) to vm-501-disk-9 (the clone)
    
  • Now boot the virtual machine and check if everything is working as expected
  • If everything works as expected, stop the machine again
  • Revert the changes to 501.conf (yes, revert the config changes you just made)
    vi /etc/pve/nodes/proxmox/qemu-server/501.conf
    # change vm-501-disk-9 (clone) back to to vm-501-disk-2 (the original)
    
  • Destroy the clone snapshot
    # destroy the snapshot clone, it is no longer needed
    zfs destroy -r rpool/data/vm-501-disk-9
    
  • Rollback the target snapshot (CAUTION: This will also delete ALL newer snapshots after the one you’re rolling back to)
    # rollback to the working state
    zfs rollback -r rpool/data/vm-501-disk-2@zfs-auto-snap_daily-2023-06-22-0425
    

And… that’s it. Now you boot up your virtual machine again and it is as if you actually did go back in time.

If you prefer a more visual demonstration of the above process, you could take a look at this video: https://www.youtube.com/watch?v=D1JiI5MfavI&t=1175s

Have fun!

Andreas Fuhrich avatar
About Andreas Fuhrich
I’m a professional software developer and tech enthusiast from Germany. On this website I share my personal notes and side project details. If you like it, you could support me on github - if not, feel free to file an issue :-)