Genetic-Programming Based Prediction of Data Compression Saving

We use Genetic Programming (GP) to generate programs that predict the data compression ratio for compression algorithms. GP evolves programs with multiple components. One component analyses statistical features extracted from the files' byte frequency distribution to come up with a compression ratio prediction. Another component does the same but by analysing statistical features extracted from the files' raw ASCII representation. A further (evolved) component acts as a decision tree to determine the overall output (compression ratio estimation) returned by an individual. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the two prediction components. The evolved decision tree has the choice to select either the outputs of the two compression prediction trees or alternatively, to integrate them into an evolved mathematical formula. Experiments with the proposed approach show that GP is able to accurately estimate the compression ratio of unseen files thereby avoiding the need to run multiple compressions on a file to decide which one provide best results.

[1]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[2]  Nikhil R. Pal,et al.  A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[3]  John G. Cleary,et al.  Unbounded length contexts for PPM , 1995, Proceedings DCC '95 Data Compression Conference.

[4]  WILLIAM H. HSU,et al.  Automatic synthesis of compression techniques for heterogeneous files , 1995, Softw. Pract. Exp..

[5]  William Culhane Statistical Measures as Predictors of Compression Savings , 2008 .

[6]  Ian H. Witten,et al.  Arithmetic coding for data compression , 1987, CACM.

[7]  Pablo A. Estévez,et al.  Genetic programming-based clustering using an information theoretic fitness measure , 2007, 2007 IEEE Congress on Evolutionary Computation.

[8]  Riccardo Poli,et al.  Evolutionary lossless compression with GP-ZIP , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).