2

I got a question about ZFS snapshotting:

So, as far as I understand, when I take a snaphot of a ZFS dataset, the snapshot will be initially 0 bytes in size, because at the moment of creation, there is no difference in files between the dataset itself and the snapshot, right?

When I now delete a 1GB file in the live dataset, the live dataset becomes 1GB smaller and the snapshot becomes 1GB bigger because that file is now only referenced in the snapshot.

Now, if I have multiple snapshots from different time points (let's say 10, during all of which the 1GB file was present), after deleting the 1GB file ALL of the 10 snapshots will get 1GB bigger, right? However, I assume that they will only be shown as 1GB bigger but the actual blocks where the 1GB file was written are still only present on disk once, so the disk itself is only filled up with 1GB and all of the 10 snapshots are each referencing that 1GB, correct?

In other words: All 10 snapshots will show (in "zfs list -t snap" for example) that they are 1GB big, but all 10 together will only take up 1GB of actual space on the disk?

If that is the case: How does replication via syncoid factor into this? When I have a backup server that is pulling snapshots from my main server via syncoid and those snapshots on the main server become bigger (because files were deleted), will syncoid pull these changed snapshots again (or at least the delta of what changed), even though they were already pulled before? If they ARE pulled again (or their delta): Will only the references be updated or will the snapshots pulled to the backup server then each contain the actual data of the deleted files? In other words: Will a deleted 1GB file lead to 1GB of space taken up on the BACKUP server as well, or will that take up 10GB (1GB per snapshot) on that machine?

Thanks in advance for any help!

2

1 Answer 1

1

To answer your questions, here's my understanding:

The very first snapshot of the dataset is very important, it will be a picture-in-time of all your details in your dataset. So it will match very closely to total size of the actual dataset. It is used as a basis to base all other snapshots from. So it is very critical to always keep this first snapshot. And I would encourage to rename it to something like "BASE-2023-10-16_13:42" and don't ever delete it. This will help with any future zfs send / zfs recv (or syncoid) functions.

After the inital snapshot, the amount of data blocks used to save the dataset and snapshots information will never decrease below this first snapshot.

However, the total storage size will (increase) as you add or (decrease) subtract new files between snapshots. Deleting a file in your dataset doesn't affect your snapshot total storage used until you do another snapshot. If you had deleted a file since the last snapshot, the dataset size will decrease but the next snapshot will have a refer of 0 size (ie. 0 new data increase). If you had added a file (or files) since the last snapshot, the dataset size will increase and the next snapshot will have a refer of the size of the new file (or files).

And to clarify, the 'data storage block size' of all the snapshots is really the size of unique used data blocks across all snapshots and the original dataset.

When I now delete a 1GB file in the live dataset, the live dataset becomes 1GB smaller and the snapshot becomes 1GB bigger because that file is now only referenced in the snapshot.

Actually the snapshot data storage doesn't change, if you delete the file in the dataset. The previous snapshot still has that data stored. However, the next snapshot would have indication to not save the deleted file. So the snapshot itself will be smaller, but the total data storage used across all the snapshots will be the same.

. . . they will only be shown as 1GB bigger but the actual blocks where the 1GB file was written are still only present on disk once, so the disk itself is only filled up with 1GB and all of the 10 snapshots are each referencing that 1GB, correct?

Correct.

. . . All 10 snapshots will show (in "zfs list -t snap" for example) that they are 1GB big, but all 10 together will only take up 1GB of actual space on the disk?

Actually 'zfs list -t snap` will show the first snap at 1 GB, the following snapshots will be 0 (i.e. amount of storage increase from previous snapshot).

With your questions about a backup server and using syncoid to copy snapshots. In general, your copy of the original pool and snapshots will never exceed the original data storage size. Copying over snapshots will generally only copy over the differences between snapshots, not the entire dataset. The structure and data storage will be exactly like from the original source.

Hope that helps with the zfs understanding .

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .