2

Context: We are working on a data migration project involving the synchronization of data between an on-premises NFS filesystem and an Azure NFS-based file share. The goal is to ensure a seamless transition from the on-premises environment to Azure while maintaining data integrity and efficiency.

Background:

Source: On-premises NFS filesystem.

Destination: Azure NFS-based file share.

Data Size: Approximately 350GB.

Tool Usage:

AzCopy (Not Supported): We initially attempted to use AzCopy for data migration but discovered it is not supported with Azure NFS file systems.

Rsync (Storage Growth Issue): We then turned to Rsync for data synchronization. However, we encountered significant storage growth at the Azure destination, and the process was never completed. The storage kept increasing without apparent reason, forcing us to abandon the Rsync process.

Fpsync (Successful First Attempt): To address the storage growth issue, we transitioned to Fpsync for data synchronization. In the first attempt, it showed promise as it successfully completed the initial synchronization.

The Issue: Unexplained Storage Growth: Our primary challenge is the unexplained growth in storage utilization at the Azure NFS destination, especially with Rsync. Even when the source data size remains the same, the destination storage increases significantly, making the process unmanageable.

Objective: We are seeking insights, advice, or solutions from the community to help troubleshoot and resolve this storage growth issue. Our goal is to ensure efficient data synchronization and minimal storage usage at the Azure destination.

Additional Information: The source data, including hidden files and directories, is correctly formatted and named.

Permissions are preserved during synchronization.

While we had initial success with Fpsync for the first synchronization, subsequent synchronizations still exhibited storage growth issues. Any suggestions, insights, or experiences related to this issue would be greatly appreciated. We are looking to resolve this challenge and ensure a successful data migration to Azure NFS.

Update:

Now i have used the rclone utility and hit the same issue.

7
  • Please edit your question to add some exact command lines attempted. rsync behavior is heavily modified by options, such as which --delete used if any. And looks like fpsync has even more options. Also, please describe file name patterns, file counts, and if files are deleted in normal operations. A sync script not deleting files gone from the source is a bit different than giant files that keep getting appended to. Sep 24 at 22:08
  • Here is the fpsync command: fpsync -n 8 -v "$subdir" "$destination_directory/$subdirname"
    – green
    Sep 25 at 20:17
  • Here is the rsync command:- rsync -avz --progress /source /destination
    – green
    Sep 25 at 20:19
  • Is snapshotting a feature of Azure NFS? Sep 27 at 7:12
  • Can you do a diff compare between the Azure nfs share and the on-prem nfs share diff -qr directory-1/ directory-2/ . That hopefully highlights what you need to investigate in the Azure nfs share to explain the storage growth.
    – Ace
    Sep 27 at 16:28

1 Answer 1

0

Read man rsync carefully. Try some options out, with --dry-run --itemize-changes to see what exactly would be done.

Not providing any delete option means a delete on the source will not be reflected on the target. Great for archival use cases, not as good for something with a limited retention like date stamped log files. Also, avoid * wildcards if you want to delete files, per man page:

   --delete
          This  tells rsync to delete extraneous files from the receiving side (ones that aren't on the sending
          side), but only for the directories that are being synchronized.  You must have asked rsync  to  send
          the  whole  directory  (e.g.  "dir"  or "dir/") without using a wildcard for the directory's contents
          (e.g. "dir/*") since the wildcard is expanded by the shell and rsync thus gets a request to  transfer
          individual  files,  not  the  files' parent directory. 

"The default behavior is to create each temporary file in the same directory as the associated destination file." These temporary files allow the transfer to be aborted, but require significant extra space. Conservatively, assume twice the size of the source, for worst case scenarios of needing to update everything. Of the ways to change this behavior, perhaps the most aggressive is --inplace which overwites files directly. Danger: this will corrupt files in use on the destination, its not for active/active use cases.

Regarding performance, find what the limiting factors are both local and remote systems. If I make up worst case numbers, a million files on 100 IOPS slow spindles might take hours just to enumerate and compare the file list. However when it gets around to copying file data, bottlenecks may transition to network bandwidth, and CPU for ssh and compression.

Come up with alternative plans for an initial copy that are not file sync tools. For example, take a local backup of the share, and restore it to a host in Azure with that NFS mounted. Faster and simpler to copy an archive (.tar or whatever) over the network and extract it all, compare to incremental file sync.

Speaking of, rsync might be useful as an incremental to catch things up after the initial copy. Will still take some time to compare, but much faster if the change rate is low and there is not much to copy.

1
  • in regards to temporary storage space comment, when does this temporary storage is reclaimed. in my case i ran fpsync under the hood which runs rsync. the first sync took about 500GB additional 150 GB space of that original storage, then again ran the second sync in few hours the space went to 1TB +
    – green
    Sep 28 at 19:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .