Using compression to understand the distribution of building blocks in genetic programming populations

Compression algorithms generate a predictive model of data, using the model to reduce the number of bits required to transmit the data (in effect, transmitting only the differences from the model). As a consequence, the degree of compression achieved provides an estimate of the level of regularity in the data. Previous work has investigated the use of these estimates to understand the replication of building blocks within genetic programming (GP) individuals, and hence to understand how different GP algorithms promote the evolution of repeated common structure within individuals. Here, we extend this work to the population level, and use it to understand the extent of similarity between sub-structures within individuals in GP populations.

[1]  Nguyen Xuan Hoai,et al.  Genetic Transposition in Tree-Adjoining Grammar Guided Genetic Programming: The Duplication Operator , 2005, EuroGP.

[2]  Malcolm I. Heywood,et al.  Context-Based Repeated Sequences in Linear Genetic Programming , 2005, EuroGP.

[3]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[4]  William B. Langdon,et al.  Repeated Patterns in Tree Genetic Programming , 2005, EuroGP.

[5]  Daryl Essam,et al.  Code Duplication and Developmental Evaluation in Genetic Programming , .

[6]  William B. Langdon,et al.  Repeated Sequences in Linear Genetic Programming Genomes , 2005, Complex Syst..

[7]  Daryl Essam,et al.  How Different are Genetic Programs ? Entropy Methods for Studying Diversity and Complexity in Genetic Programming , 2007 .

[8]  Bruce Edmonds,et al.  Syntactic Measures of Complexity , 1999 .

[9]  Nguyen Xuan Hoai,et al.  Analysing the Regularity of Genomes Using Compression and Expression Simplification , 2007, EuroGP.

[10]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[11]  Anikó Ekárt,et al.  Shorter Fitness Preserving Genetic Programs , 1999, Artificial Evolution.

[12]  Mengjie Zhang,et al.  Algebraic Simplification of Genetic Programs during Evolution , 2006 .

[13]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[14]  Jeff Edmonds,et al.  Lower bounds with smaller domain size on concurrent write parallel machines , 1991, [1991] Proceedings of the Sixth Annual Structure in Complexity Theory Conference.

[15]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.

[16]  Nicholas S. Flann,et al.  Improving the accuracy and robustness of genetic programming through expression simplification , 1996 .

[17]  Nguyen Xuan Hoai,et al.  Developmental evaluation in Genetic Programming: The TAG-based frame work , 2008, Int. J. Knowl. Based Intell. Eng. Syst..