قالب وردپرس درنا توس
Home / Tips and Tricks / Compress and decompress files with tar in Linux – CloudSavvy IT

Compress and decompress files with tar in Linux – CloudSavvy IT



Shutterstock / iunewind

Tar is more than just an archiving utility: tar comes with some great built-in features that allow you to compress and decompress files while simultaneously archiving them. Learn all about it in this article and more!

What̵
7;s tar and how do I install it?

According to the tar manual (which you can access by typing man tar once installed), tar is an archiving utility. It supports many functions, including compressing and decompressing files directly when archiving. Let’s start installing tar:

to install tar on your Debian / Apt based Linux distribution (such as Ubuntu and Mint), run the following command in your terminal:

sudo apt install tar

to install tar on your RedHat / Yum based Linux distribution (such as RHEL, Centos and Fedora), run the following command in your terminal:

sudo yum install tar

Next, we’ll create some sample data:

mkdir test; cd test
touch a b c d e f 
echo 1 > a; echo 5 > e; echo '22222222222222222222' > b

Set sample data to compress

Here we created a directory test and created six empty files in it using the touch order. We have also added some songs to files a, e, and b, although notably file b has repetitive data that compresses well.

If you want to learn more about how compression works, see How does file compression work? article.

Create an uncompressed archive

Simple, uncompressed tar archive creation

tar -hcf all_files.tar *
ls -l | grep -v total | awk '{print $5"tbytes for: "$9}' | sort -n

Here we have created an uncompressed archive with the tar -hcf all_files.tar * order. Let’s take a look at the options used in this command.

First, we have -h which is not required in this particular case, I highly recommend that you always include in your tar commands. This option stands for the reference, which will remove (or follow) symbolic links, archive and dump the files they point to.

Next, we have the -c and -f options. Note that they only work with the - in -h, ie instead of specifying another -we simply tag them on the other shorthand options. Fast and easy.

The -c option stand for create a new archive. Note that folders are archived recursively by default unless a –no-recursion option is also used. The -f option allows us to specify the name of the archive. So it should come last in our options chain (because it requires an option) so that we can add the archive filename right after it. Using tar -fch test.tar * will not work:

Shorthand options that require an option cannot be placed in front

After the tar is generated, we’ll use a custom ls output that clearly shows us the number of bytes per file. As you can see, the tar file is much larger than all of our files combined. The files are just archived and there is a general overhead for it tar will be added.

As an interesting side note, we can also see what types of files are handled simply by the file command at the command prompt:

file c
file b
file all_files.tar

Use file to view file type

Create an uncompressed archive

A commonly used compression algorithm is GZIP. Let’s add the option for the same (-z) to our suite of shorthand command line options and see how it affects file size:

tar -zhcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk '{print $5"tbytes for: "$9}' | sort -n

Looking at the size of a compressed archive versus an uncompressed archive

This time we specified a regular expression to use only the named files a to f, which caused tar command of including the all_files.tar file in the new all_files.tar.gz File!

See How do you actually use Regex? and Edit text using regular expressions with sed if you want to learn more about regular expressions.

We also have the -z option that uses GZIP compression to create the resulting .tar file as soon as the data dumping in it is completed. It’s great to see that we end up with a 186-byte file telling us that – in this case – the tar header / overhead of about 10Kb can be compressed very well.

The total size of the archive is 7.44 times larger than the total file size, but it matters little as this fictional example is not representative of compressing large files where you can almost always see a profit instead of a loss, unless the data has been pre-compressed or of such a format that it cannot be easily condensed using a variety of algorithms. Still, one algorithm (such as the GZIP algorithm) may be better than another (such as, for example, BZIP2), and vice versa, for different datasets.

Get more bytes using high-level compression

Can we make the file even smaller? Yes. We can set the maximum compression option of GZIP by changing the -I option to tar which allows us to specify a compression program to use (thanks to stackoverflow user ideasman42):

tar -I 'gzip -9' -hcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk '{print $5"tbytes for: "$9}' | sort -n

Use the -I option to tar to specify a compression program

Here we have specified -I 'gzip -9' as the compression program to use, and we have it -z option (since we are now specifying a specific custom program to use instead of using the built-in tar GZIP configuration). As a result, we have 12 bytes less due to a better (but generally slower) compression attempt (at level -9) by GZIP.

In general, the faster the compression (lower level of compression attempts, ie -1), the larger the file size. And the slower the compression (higher level of compression attempts, ie -9), the smaller the file. You can set your own preference by varying the compression level from -1 (Fast to -9 (slowly)

Other compression programs

There are two other common compression algorithms that one can research and test (different algorithm options also give different sizing results, and may have additional compression options), and that is bzip2, which can be used by the -j option to tar, and XZ which can be used by the -J option.

Alternatively, you can use the -I command to set maximum compression options for bzip2 (-9):

bzip -9 compression program example

And -9e in front of xz:

xz -9th compression program example

As you can see, the results in this case are not as good as when using the somewhat standard GZIP algorithm. Still, the bzip2 and xz algorithms can show improvements with other data sets.

Decompress a file

Decompressing a file is super easy regardless of the original method of compressing it, and provided such a compression algorithm is present on your computer. For example, if the original compression algorithm was bzip2 (denoted by a .bz2 extension to the tar filename), then you want it done sudo apt install bzip2 (or sudo yum install bzip2) on your target computer that needs to decompress the file.

rm a b c d e f
tar -xf all_files.tar.gz
ls

Decompression of a compressed (or uncompressed) tar archive

We simply specify -x to to expand or decompress our all_files.tar.gz file, and specify the file name by typing the -f shorthand option as before.

By compressing files, you can save a lot of space on your storage devices and know how to use them tar coupled with available compression options will help you with that. Once the archive needs to be extracted again, it is easy to do this, provided that the appropriate decompression software is available on the computer used to decompress or extract the data from your archive. To enjoy!


Source link