Evolutionary lossless compression with GP-ZIP

In this paper we propose a new approach for applying genetic programming to lossless data compression based on combining well-known lossless compression algorithms. The file to be compressed is divided into chunks of a predefined length, and GP is asked to find the best possible compression algorithm for each chunk in such a way to minimise the total length of the compressed file. This technique is referred to as ldquoGP-ziprdquo: The compression algorithms available to GP-zip (its function set) are: arithmetic coding (AC), Lempel-Ziv-Welch (LZW), unbounded prediction by partial matching (PPMD), run length encoding (RLE), and Boolean minimization. In addition, two transformation techniques are available: Burrows-Wheeler transformation (BWT) and move to front (MTF). In experimentation with this technique, we show that when the file to be compressed is composed of heterogeneous data fragments (as is the case, for example, in archive files), GP-zip is capable of achieving compression ratios that are superior to those obtained with well-known compression algorithms.