Questions tagged [deduplication]

The tag has no usage guidance.

Filter by
Sorted by
Tagged with
0 votes
1 answer
37 views

rsync cache/deduplication for similar files

I am using rsync to transfer/update 100s of TB from one fileserver to another, millions of files. Many of those files are quite similar (but not identical). I was wondering if there's a way to benefit ...
uli42's user avatar
  • 315
1 vote
1 answer
255 views

Dovecot process synchronization for duplicate email detection

We have a dovecot server version 2.3.16 with sieve duplicate extension enabled to detect and discard duplicates. We have duplicate routing in Postfix virtual_aliases file which causes duplicate copies ...
device_exec's user avatar
0 votes
0 answers
63 views

Duplicate files with the server name on a File Server

For some time it has been happening to me that in a file server, where I have a shared folder with the departments of the company, in some folders it has been duplicating some files by adding the name ...
Ricardo Prada's user avatar
0 votes
0 answers
48 views

Deduplication inside VHDX

I want to use deduplication on files coming from a vhdx, but it doesn't seem to work as it doesn't optimize space. I tested it on another 'normal' file system (containing xlsx, pst, doc files, etc.) ...
GoldenH4ck's user avatar
1 vote
1 answer
484 views

Migrating file server with Windows Deduplication by switching disk

I'm migrating a bunch of large file servers to new Windows VMs as a way of updating the OS, planning on switching the virtual disks over and exporting/importing the lanman share registry and retaining ...
N8_AS's user avatar
  • 21
0 votes
1 answer
98 views

Recover files from deduplicated volume in Windows Storage Server 2016

We have a Windows Storage Server 2016 having some volumes with Storage Deduplication enabled. Its OS crashed so we had to re-install the OS (couldn't restore from backup). There are some files which ...
Piyush B's user avatar
  • 103
1 vote
1 answer
221 views

zfs zpool dedup stat seems very wrong - how to interpret these values?

I run a fileserver receiving backups of user containers. Two containers are poorly run docker systems with hundreds of near-identical directories not using overlayfs nor zfs clones. (I cannot touch ...
math's user avatar
  • 230
1 vote
1 answer
107 views

rsync Delete only Matching files

How can I use rsync (or another program) to delete identical files between directories? To be identical, location and metadata should match (permissions, ownership, timestamp) For example, I backup ...
user206746's user avatar
0 votes
0 answers
136 views

How to dedup existing rsnapshot on ZFS?

I have a backup box using ZFS (on Ubuntu 20.04 LTS) where I use rsnapshot to back up a number of websites. These websites have a fair amount of duplicated data. I have just enabled de-duplication of ...
davidgo's user avatar
  • 6,242
0 votes
1 answer
164 views

Windows hardlinks - why it consumes same space as original file

Im using finddupe tool to de-duplicate files and save some space on HDD. But checking before and after operation I cant see any free space released - see below. Can somebody advise what is wrong here ?...
Maciej's user avatar
  • 123
-1 votes
2 answers
964 views

Find file duplicates and convert them into links [WINDOWS] [closed]

My users tend to save tons of duplicate files what consumes more and more space and generate HW and archiving cost. Im thinking to create some scheduled job, to: find duplicate files (check file MD5 ...
Maciej's user avatar
  • 123
0 votes
1 answer
965 views

Workaround for Windows Search/Indexing on a deduplicated volume

According to Microsoft, Microsoft Windows Server 2019 still does not support Windows Search on Data Deduplication enabled volumes (source): Windows Search doesn't support Data Deduplication. Data ...
stackprotector's user avatar
1 vote
1 answer
377 views

Dedup.sys 0x7E after inplace upgrade Server 2016 => 19

I've recently performed an in-place upgrade from Server 2016 => 2019 (non-domain). 2 of the drives connected has been configured with data deduplication (mostly VHDs for Hyper-V). Ever since the ...
Callumwill1991's user avatar
3 votes
1 answer
2k views

Cheap/fast deduplication with hardlinks?

I've got shared hosting with a few thousand Wordpress installs and I've wanted for ages to have a nice way of removing all the duplicate files in a sensible and secure way. I'm looking for better disk ...
Nick's user avatar
  • 287
2 votes
1 answer
1k views

ZFS interpret output of zdb -S tank

I wanted to know if it would pay it out for me to activate the zfs deduplication so I ran the command zdb -S tank but know I need some help to interpret the output. Simulated DDT histogram: bucket ...
White's user avatar
  • 25
1 vote
0 answers
551 views

btrfs compress existing data with snapshots

I'm runninga a btrfs raid1 on 5 8 TB disks, where primarily urbackup stores backups to. Urbackup uses snapshots extensively. compsize produces the following output: Processed 106161732 files, 57335496 ...
Christian's user avatar
  • 111
3 votes
1 answer
113 views

Setup Raid 10 for backup store

We have HP DL380p Gen8 with Smart Array P420i: 10HDDx6TB, 2 RAID10 6 and 4 HDD, for backup store via Arcserve UDP, deduplication backup configured for 16 KB blocks, we are not satisfied with the speed ...
mgkmgk's user avatar
  • 31
4 votes
0 answers
740 views

Windows NTFS Data Deduplication and Snapshot Backups

We have a file server (fs00) on Google Cloud Platform (GCP): Running Windows Server 2019 (with Desktop Experience installed). One OS / System disk (250GB SSD) An independent data disk (5TB standard ...
iamacarpet's user avatar
0 votes
0 answers
2k views

ZFS send/receive fails consistently with "Input/output error"

I have a ZFS disk with deduplication turned on. I have copied data from many other old hard disks onto this ZFS disk, ultimately with the intention of getting rid of all the original disks. I need ...
Pryo's user avatar
  • 675
2 votes
1 answer
490 views

ZFS deduplication

This is my first post on this website so please be kind with me :) On Centos 7, using ZFS deduplication in the past on a storage pool, I moved all file-systems that dedup was on to another storage ...
Nicolas-V13's user avatar
2 votes
0 answers
129 views

Windows 2016 deduplication and application compatibility

I'm re-evaluating Windows 2016/2019 deduplication engine, which is way better (faster/more capable) than what shipped in previous Windows versions. I understand how it works (sparsifying files via ...
shodanshok's user avatar
  • 48.5k
0 votes
0 answers
44 views

postfix dovecot avoid duplicate sendings from two different IPs but same computer, same sender/recipients, same subject & body

Is there a way to avoid duplicate sendings from two different IPs but same computer, same sender, same recipients, same subject & body ? I'm using a postfix/dovecot server to which one of the ...
Fabien Auréjac's user avatar
5 votes
2 answers
2k views

Offline Deduplication Software for ZFS

I do have a 300TB Freenas server to backup several linux nodes. Backup works with daily snapshot and rsync tasks. The user often move big datasets (2-5TB) between the server - so often big files are ...
philipp's user avatar
  • 101
2 votes
1 answer
183 views

puppet code design: how to collect arguments into a string (avoiding possible duplicate declaration errros)

I'm having trouble with puppet (version 5.5) code design. I've written a component module to deal with ucarp. It make use of the module eyp-systemd to register an ucarp service within systemd. Now I ...
C.Scharfenberg's user avatar
1 vote
0 answers
157 views

Is Path-based Unoptimization Possible?

I have a deduplicated volume in Windows Server 2016 that some software is having problems dealing with. Specifically, Matlab 2015 and older couldn't read deduplicated files, and while updating to a ...
Louis Waweru's user avatar
1 vote
0 answers
311 views

Deduplication Savings differs a lot between same two volumes

I run this robocopy command to copy all data from the existing volume to my new volume which will replace the old one in the near future: robocopy I:\ E:\ /E /COPYALL /PURGE /MIR /B /R:1 /W:1 /XD "...
donduc_jr's user avatar
6 votes
1 answer
2k views

Windows 2012 Robocopy with Deduplication active

Current Issue: File Cluster Disk of 14 TB Size needs an CHKDSK which can't be done because it lasts too long. So I'm going to replace this disk in the storage of the cluster. I run many times the ...
donduc_jr's user avatar
1 vote
1 answer
114 views

VHDX drive space being consumed

I have a host running Server 2012 R2, and a file server VM also running 2012 R2. The VM has 3 drives, OS.vhdx, file-1.vhdx and file2.vhdx. Dedupe is running inside the VM on file-1.vhdx and file2....
morleyc's user avatar
  • 1,150
13 votes
5 answers
21k views

Apparently Robocopy does not understand Windows Server 2016 deduplication. Is there a way to make it work without corrupting the volume?

Scenario: Doing a server migration from old Server 2008 R2 to new Server 2016, following this Server Fault guide: File server migration using Robocopy After Robocopy completes, enable deduplication ...
Dale Mahalko's user avatar
1 vote
0 answers
52 views

Deduplication assistance

We have consolidated a couple of drives and NAS systems of a former colleague before the disks failed (some actually showed signs of degradation). Now, we know that the colleague has done "RAID by ...
Marco Schuster's user avatar
4 votes
2 answers
5k views

How to verify that a deduplication has taken place?

Microsoft Windows Server 2012 and onwards offers a de-duplication service that periodically scans files, find identical chunks and removes excessive copies to save space. To the user browsing the ...
DraxDomax's user avatar
  • 139
1 vote
0 answers
649 views

How to calculate dedupe ratio per folder

Is there a way to determine the zfs dedup ratio on a per folder basis? I realize the potential difficulties in computing this as every block would need to be analyzed, but can such a method be ...
Dan Buhler's user avatar
0 votes
1 answer
2k views

Size on disk smaller than capacity but no free space

We have a Windows 2012 Veeam Server with a volume that we use to store our backups on. The volume has a capacity of 14.9 TB and has data deduplication enabled. On the volume there are folders for the ...
Skanderbeg's user avatar
1 vote
1 answer
2k views

Windows deduplication doesn't yield usable space

Dedup size picture Hi all, When I setup Windows depuplication, I get great savings. See the attached picture. 1.28 TB deduped down to 23 GB. However, the space savings are not real. Lets say that ...
GeoffM's user avatar
  • 75
0 votes
0 answers
70 views

recurring, queryable, cached checksumming of all files in a filesystem

I'm looking for a way to efficiently manage and leverage file-level checksums for all files in a filesystem over time. Goals: Configurable, fast refresh - only re-checksumming large files when ...
Royce Williams's user avatar
3 votes
1 answer
3k views

Can an unoptimize dedup job be resumed?

I have a Start-DedupJob -type Unoptimize ... that's been running for 7 hours and is still at 0% progress. I'm reconfiguring the server and need to restart. Disk activity has been pegged to near ...
Louis Waweru's user avatar
2 votes
1 answer
565 views

How to read fully deduplicated files over SMB on macOS?

I have a volume in Windows Server 2016 that is a target of the data deduplication role. It works very well ~60% savings rate, with many files appearing as zero bytes on the disk. The volume is ...
Louis Waweru's user avatar
2 votes
1 answer
2k views

How to find duplicate files against a reference directory structure in Linux

There are a couple of duplicate file finders for Linux listed e.g. here. I have already tried fdupes and fslint. However, from what I have seen, these will find all duplicates of the selected ...
packoman's user avatar
  • 175
3 votes
1 answer
2k views

zfs wrong space usage

I have a backup server with ZFS (Ubuntu 16.04; 32GB RAM, 4x6TB HDD, raidz2). Recently I've found the problem with space available. # zpool list -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP ...
Karol Czachorowski's user avatar
1 vote
1 answer
811 views

Unable to mount borg backup repository residing on a ZFS snapshot

I've posted the question and my own solution to this problem, to make it available to others. This relates to borg backup and was posted to this borg github issue: Allow check on read-only ...
zoot's user avatar
  • 257
4 votes
1 answer
3k views

On deduped volumes, how can I determine space used in a folder with Measure-DedupFileMetadata

I'm trying to understand how Measure-DedupFileMetadata works so I can recursively go through some folders to report on how much space is actually used. I don't know how to interpret the output. If I ...
Dan Buhler's user avatar
3 votes
1 answer
6k views

How to use OverlayFS with Docker volumes?

For some use cases I'd like to have the possibility to create a volume with docker volume create, and fill it with data. Then I want to create a new volume that is only a copy of the first one, but no ...
tabb's user avatar
  • 131
1 vote
0 answers
465 views

Clarifying sizing of dedup table on ZFS when only some datasets are deduped

I am trying to be clear about how zfs dedup handles the case where some (but not all) datasets in a pool are deduped, from a dedup table/RAM impact perspective. I found this quote from the FreeBSD ...
Stilez's user avatar
  • 714
7 votes
3 answers
5k views

Windows 2016 Storage Spaces Direct + Data Deduplication

Has anyone combined S2D (Storage Spaces Direct) with Data Deduplication? Is this even possible or recommended practice? Please elaborate on why this is a good idea or not. EDIT: Just stumbled ...
Paul Masek's user avatar
1 vote
0 answers
74 views

Better to deduplicate inside VHDX on client or outside VHDX on host? [duplicate]

I have a 16TB array that I'm hosting in a virtual environment. It contains video and photos from our marketing department. The data resides on the Hyper-V host itself as a VHDX file on a 8-drive SATA ...
Mike Capps's user avatar
0 votes
0 answers
54 views

Is there a way to measure or change how fast a Windows Server 2012 rehydrates data that it previously deduplicated?

Since the deduplication option was enabled on the Windows Server (currently at about 40 percent deduplication rate overall), the speed of the Backup Exec 16 job became much slower - less than 30 ...
Kit's user avatar
  • 101
8 votes
1 answer
7k views

NTFS Compression with Deduplication?

I have installed the Data Deduplication role on Windows Server 2016. Per Microsoft, it's using NTFS compression to help save space: Data deduplication stores more data in less physical space. It ...
Andrew Bucklin's user avatar
4 votes
2 answers
4k views

Deduplication and cost savings on IaaS object stores (S3/Azure Blobs)

Do any of the commercial IaaS object stores (S3, Azure Blobs etc.) avoid charging multiple times for storing duplicate data (identical files, or parts of files)? For instance, we have a 15 TB dataset ...
Jedi's user avatar
  • 468
1 vote
1 answer
58 views

18.1 TB Drive Shows .75 TB Free but only 4.66 TB Deduped on Windows Server 2012 R2

I have a Dell PE R520 with 18.1 TB drive (built on RAID 6, disk internal to server). I setup Microsoft's Deduplication functionality on this drive and began moving large amounts of material onto it. ...
David Mackey's user avatar
1 vote
0 answers
2k views

Error 0x80070519 reading file: revision level is unknown

I have mounted a dedup volume in Windows Server 2012 R2. When I try to copy a file from the volume to another I get the error 0x80070519 revision level is unknown The volume seems to be healthy ...
Ronnie Overby's user avatar