Estimating the distribution and propagation of genetic programming building blocks through tree compression

Shin et al [19] and McKay et al [15] previously applied tree compression and semantics-based simplification to study the distribution of building blocks in evolving Genetic Programming populations. However their method could only give static estimates of the degree of repetition of building blocks in one generation at a time, supplying no information about the flow of building blocks between generations. Here, we use a state-of-the-art tree compression algorithm, xmlppm, to estimate the extent to which frequent building blocks from one generation are still in use in a later generation. While they compared the behaviour of different GP algorithms on one specific problem -- a simple symbolic regression problem -- we extend the analysis to a more complex problem, a symbolic regression problem to find a Fourier approximation to a sawtooth wave, and to a Boolean domain, odd parity.

[1]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[2]  M. O'Neill,et al.  Grammatical evolution , 2001, GECCO '09.

[3]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[4]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[5]  Peter Grant,et al.  Proceedings IEEE Data Compression Conference , 1991 .

[6]  Peter Nordin,et al.  Evolving Turing-Complete Programs for a Register Machine with Self-modifying Code , 1995, ICGA.

[7]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[8]  Nguyen Xuan Hoai,et al.  Building on Success in Genetic Programming: Adaptive Variation and Developmental Evaluation , 2007, ISICA.

[9]  Nguyen Xuan Hoai,et al.  Analysing the Regularity of Genomes Using Compression and Expression Simplification , 2007, EuroGP.

[10]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[11]  William B. Langdon,et al.  Repeated Sequences in Linear Genetic Programming Genomes , 2005, Complex Syst..

[12]  Nguyen Xuan Hoai,et al.  Using compression to understand the distribution of building blocks in genetic programming populations , 2007, 2007 IEEE Congress on Evolutionary Computation.

[13]  En-Hui Yang,et al.  Estimating DNA sequence entropy , 2000, SODA '00.

[14]  Ian H. Witten,et al.  Protein is incompressible , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[15]  A. Lindenmayer Mathematical models for cellular interactions in development. I. Filaments with one-sided inputs. , 1968, Journal of theoretical biology.

[16]  Lee Spector,et al.  Ontogenetic programming , 1996 .

[17]  William B. Langdon,et al.  Repeated Patterns in Tree Genetic Programming , 2005, EuroGP.

[18]  A. Lindenmayer Mathematical models for cellular interactions in development. II. Simple and branching filaments with two-sided inputs. , 1968, Journal of theoretical biology.

[19]  Malcolm I. Heywood,et al.  Context-Based Repeated Sequences in Linear Genetic Programming , 2005, EuroGP.

[20]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[21]  Nguyen Xuan Hoai,et al.  Developmental Evaluation in Genetic Programming: The Preliminary Results , 2006, EuroGP.

[22]  Nguyen Xuan Hoai,et al.  Representation and structural difficulty in genetic programming , 2006, IEEE Transactions on Evolutionary Computation.

[23]  James Cheney Compressing XML with multiplexed hierarchical PPM models , 2001, Proceedings DCC 2001. Data Compression Conference.