Skip Nav

Speed Data File Compression Using Pigz

Pigz ("pig-zee") is a data file compression utility that drastically speeds compression on a batch or interactive node.

Usage

Pigz is a parallel implementation of and a fully functional replacement for gzip. Pigz exploits multiple cores when compressing data files. By default, pigz will use all available cores on the node on which it is running.

Syntax:

tar -cf filename.tar.gz --use-compress-program=pigz directory_name

Example 1: Using pigz with the tar command to compress a directory. This syntax uses all the cores available to the node to perform the compression. Do this only within a batch node.
tar -cf myfilename.tar.gz --use-compress-program=pigz myfilename

Example 2: Using pigz to compress files through a shell pipe. This method is slower, but can be done on the interactive nodes as long as the number of cores is kept low (x=4).
tar -cf myfilename | pigz -p x > myfilename.tar.gz

Note: Replace x with the number of cores pigz should use. 4 seems optimal for interactive use.

Recommendation

Copy your data to $CENTER, then login to the Utility Server to post process. When complete, start an interactive batch session (via "qsub -A $ACCOUNT -l") and execute the first tar command in Example 1 above, or add it to your job script. Then archive the results.

Example 3: Compressing files using a job script.

#!/bin/csh
#PBS -A $ACCOUNT
#PBS -N compress_results
cd $CENTER
tar -cf myfilename.tar.gz --use-compress-program=pigz myfilename
exit

Then archive the results from an interactive node.

cp myfilename.tar.gz $ARCHIVE_HOME

or

/usr/bin/rcp myfilename.tar.gz ${ARCHIVE_HOST}:${ARCHIVE_HOME}/

Observations

Since pigz uses all available cores on a node to compress files, pigz should be invoked on a Utility Server batch node, NOT on an interactive node. The general idea is to use a dedicated batch node to generate compressed files faster than standard tar/gzip. It will also avoid impacting interactive users.