A quick-start guide to OpenZFS native encryption – Ars Technica

Close-up photograph of a padlock.
Enlarge / On-disk encryption is a complex topic, but this article should give you a solid handle on OpenZFS’ implementation.

One of the many features OpenZFS brings to the table is ZFS native encryption. First introduced in OpenZFS 0.8, native encryption allows a system administrator to transparently encrypt data at-rest within ZFS itself. This obviates the need for separate tools like LUKS, VeraCrypt, or BitLocker.

OpenZFS encryption algorithm defaults to either aes-256-ccm (prior to 0.8.4) or aes-256-gcm (>= 0.8.4) when encryption=on is set. But it may also be specified directly. Currently supported algorithms are:

  • aes-128-ccm
  • aes-192-ccm
  • aes-256-ccm (default in OpenZFS < 0.8.4)
  • aes-128-gcm
  • aes-192-gcm
  • aes-256-gcm (default in OpenZFS >= 0.8.4)

There’s more to OpenZFS native encryption than the algorithms used, though—so we’ll try to give you a brief but solid grounding in the sysadmin’s-eye perspective on the “why” and “what” as well as the simple “how.”

Why (or why not) OpenZFS native encryption?

A clever sysadmin who wants to provide at-rest encryption doesn’t actually need OpenZFS native encryption, obviously. As mentioned in the introduction, LUKS, VeraCrypt, and many other schemes are available and can be layered either beneath or atop OpenZFS itself.

First, the “why not”

Putting something like Linux’s LUKS underneath OpenZFS has an advantage—with the entire disk encrypted, an enterprising attacker can no longer see the names, sizes, or properties of ZFS datasets and zvols without access to the key. In fact, the attacker can’t necessarily see that ZFS is in use at all!

But there are significant disadvantages to putting LUKS (or similar) beneath OpenZFS. One of the gnarliest is that each individual disk which will be part of the pool must be encrypted, with each volume loaded and decrypted prior to the ZFS pool import stage. This can be a noticeable challenge for ZFS systems with many disks—in some cases, many tens of disks. Another problem with encryption-beneath-ZFS is that the extra layer is an extra thing to go wrong—and it’s in a position to undo all of ZFS’ normal integrity guarantees.

Putting LUKS or similar atop OpenZFS gets rid of the aforementioned problems—a LUKS encrypted zvol only needs one key regardless of how many disks are involved, and the LUKS layer cannot undo OpenZFS’ integrity guarantees from here. Unfortunately, encryption-atop-ZFS introduces a new problem—it effectively nerfs OpenZFS inline compression, since encrypted data is generally incompressible. This approach also requires the use of one zvol per encrypted filesystem, along with a guest filesystem (e.g., ext4) to format the LUKS volume itself with.

Now, the “why”

OpenZFS native encryption splits the difference: it operates atop the normal ZFS storage layers and therefore doesn’t nerf ZFS’ own integrity guarantees. But it also doesn’t interfere with ZFS compression—data is compressed prior to being saved to an encrypted dataset or zvol.

There’s an even more compelling reason to choose OpenZFS native encryption, though—something called “raw send.” ZFS replication is ridiculously fast and efficient—frequently several orders of magnitude faster than filesystem-neutral tools like rsync—and raw send makes it possible not only to replicate encrypted datasets and zvols, but to do so without exposing the key to the remote system.

This means that you can use ZFS replication to back up your data to an untrusted location, without concerns about your private data being read. With raw send, your data is replicated without ever being decrypted—and without the backup target ever being able to decrypt it at all. This means you can replicate your offsite backups to a friend’s house or at a commercial service like rsync.net or zfs.rent without compromising your privacy, even if the service (or friend) is itself compromised.

In the event that you need to recover your offsite backup, you can simply replicate it back to your own location—then, and only then, loading the decryption key to actually access the data. This works for either full replication (moving every single block across the wire) or asynchronous incremental replication (beginning from a commonly held snapshot and only moving the blocks which have changed since that snapshot).

What’s encrypted—and what isn’t?

OpenZFS native encryption isn’t a full-disk encryption scheme—it’s enabled or disabled on a per-dataset / per-zvol basis, and it cannot be turned on for entire pools as a whole. The contents of encrypted datasets or zvols are protected from at-rest spying—but the metadata describing the datasets/zvols themselves is not.

Let’s say we create an encrypted dataset named pool/encrypted, and beneath it we create several more child datasets. The encryption property for the children is inherited by default from the parent dataset, so we can see the following:

root@banshee:~# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase banshee/encrypted
Enter passphrase: 
Re-enter passphrase: 

root@banshee:~# zfs create banshee/encrypted/child1
root@banshee:~# zfs create banshee/encrypted/child2
root@banshee:~# zfs create banshee/encrypted/child3

root@banshee:~# zfs list -r banshee/encrypted
NAME                       USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted         1.58M   848G      432K  /banshee/encrypted
banshee/encrypted/child1   320K   848G      320K  /banshee/encrypted/child1
banshee/encrypted/child2   320K   848G      320K  /banshee/encrypted/child2
banshee/encrypted/child3   320K   848G      320K  /banshee/encrypted/child3

root@banshee:~# zfs get encryption banshee/encrypted/child1
NAME                      PROPERTY    VALUE        SOURCE
banshee/encrypted/child1  encryption  aes-256-gcm  -

At the moment, our encrypted datasets are all mounted. But even if we unmount them and unload the encryption key—making them inaccessible—we can still see that they exist, along with their properties:

root@banshee:~# wget -qO /banshee/encrypted/child2/HuckFinn.txt http://textfiles.com/etext/AUTHORS/TWAIN/huck_finn

root@banshee:~# zfs unmount banshee/encrypted
root@banshee:~# zfs unload-key -r banshee/encrypted
1 / 1 key(s) successfully unloaded

root@banshee:~# zfs mount banshee/encrypted
cannot mount 'banshee/encrypted': encryption key not loaded

root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

root@banshee:~# zfs list -r banshee/encrypted
NAME                       USED  AVAIL     REFER  MOUNTPOINT
banshee/encrypted         2.19M   848G      432K  /banshee/encrypted
banshee/encrypted/child1   320K   848G      320K  /banshee/encrypted/child1
banshee/encrypted/child2   944K   848G      720K  /banshee/encrypted/child2
banshee/encrypted/child3   320K   848G      320K  /banshee/encrypted/child3

As we can see above, after unloading the encryption key, we can no longer see our freshly-downloaded copy of Huckleberry Finn in /banshee/encrypted/child2/. What we can still see is the existence—and structure—of our entire ZFS-encrypted tree. We can also see each encrypted dataset’s properties, including but not limited to the USED, AVAIL, and REFER of each dataset.

It’s worth noting that trying to ls an encrypted dataset which doesn’t have its key loaded won’t necessarily produce an error:

root@banshee:~# zfs get keystatus banshee/encrypted
NAME               PROPERTY   VALUE        SOURCE
banshee/encrypted  keystatus  unavailable  -
root@banshee:~# ls /banshee/encrypted
root@banshee:~# 

This is because a naked directory exists on the host, even when the actual dataset is not mounted. Reloading the key doesn’t automatically remount the dataset, either:

root@banshee:~# zfs load-key -r banshee/encrypted
Enter passphrase for 'banshee/encrypted': 
1 / 1 key(s) successfully loaded
root@banshee:~# zfs mount | grep encr
root@banshee:~# ls /banshee/encrypted
root@banshee:~# ls /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

In order to access our fresh copy of Huckleberry Finn, we’ll also need to actually mount the freshly key-reloaded datasets:

root@banshee:~# zfs get keystatus banshee/encrypted/child2
NAME                      PROPERTY   VALUE        SOURCE
banshee/encrypted/child2  keystatus  available    -

root@banshee:~# ls -l /banshee/encrypted/child2
ls: cannot access '/banshee/encrypted/child2': No such file or directory

root@banshee:~# zfs mount -a
root@banshee:~# ls -lh /banshee/encrypted/child2
total 401K
-rw-r--r-- 1 root root 554K Jun 13  2002 HuckFinn.txt

Now that we’ve both loaded the necessary key and mounted the datasets, we can see our encrypted data again.