Tar and 7z compression on Linux - what's the difference?

Question

I have a problem! I have a backup script in python. It backups all the folders in /var/www/ into different .tar.7z for each folder inside /var/www/

The problem is that the compression time is very slow. And for 4GB big folders it stops compressing sometimes at 1G sometimes at 1,5GB.

This is the row for this compression:

os.system("tar cf - -C %s . 2>/dev/null 3>/dev/null | 7za a -p%s -si %s 1>/dev/null 2>/dev/null 3>/dev/null" % (cf, self.config.get(jn, "archpass"), filename))

When I try to tar -cf compress-dir.tar /var/www/bigsite.com/ the 4GB folder, it runs creates the .tar extremely quickly in few minutes it is ready.

However within the python script the temporary file that is created as soon as the .tar is starting to be generated, increases in size very slowly. After about 10 minutes it reaches about 1GB... and soon it stops increasing, not showing any error in the console.

Is there a way I can simulate the same that is happeing here: tar cf - -C %s . 2>/dev/null 3>/dev/null directly in bash? Because clearly it's not the same as tar -cf compress-dir.tar /var/www/bigsite.com/ as it runs much faster.

Maybe if I run the tar directly in bash an error may appear. Of course if you have any other ideas, please let me know.

Most tar versions support gzip compression built in by adding z option. Gzip is not as efficient as 7zip regarding the compressed size but may be little bit faster. For the 7zip problem I would suspect an old or defect version of 7zip as 7zip should not have problems with large files. You also don't have to stick with 7zip, other compressors like bzip2 can also be used. And using certain parameters you can reduce the compression efficiency to speed up the backup process. — Robert, Feb 3, 2022 at 21:10

vidarlo · Accepted Answer · 2022-02-03 21:23:35Z

Tar does not do compression by itself. That means the throughput is almost purely limited by IO capability. Hard drives can easily handle 100MB/s, so 4GB R/W should be doable in 80 seconds or so - 4GB read and 4GB written. SSD's can do hundreds of megabytes per second or more.

However, 7z does compression. Compression is basically CPU-bound, not storage-bound.

To benchmark 7z, you can run 7z b. On the laptop I'm using to write this, 7z can handle 20MB/s. My NVMe storage can handle 2GB/s. That's two orders of magnitude in difference! Compressing 4GB would take 200 seconds; simply stuffing it into a tarball should take 2 seconds!

Various compression algorithms have different tradeoffs. 7z can be configured to use differing levels of compression, in 10 steps.

You should experiment with different levels to find the correct speed-size tradeoff for your application.

Is there a way I can simulate the same that is happeing here: tar cf - -C %s . 2>/dev/null 3>/dev/null directly in bash? Because clearly it's not the same as tar -cf compress-dir.tar /var/www/bigsite.com/ as it runs much faster.

Sure, it's a shell command. os.system() simply runs the commands specified. The tar command in your example adds %s to a tarball and outputs the result to stdout. In the full example you provide that's piped to 7z.

But as I said above: it's not tar that's slow. It's 7z.

Stack Exchange Network

Tar and 7z compression on Linux - what's the difference?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
python
tar
.

Hot Network Questions

Tar and 7z compression on Linux - what's the difference?

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged pythontar.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
python
tar
.