Implicit bias and recursive grammar structures in estimation of distribution genetic programming

Much recent research in Estimation of Distribution Algorithms (EDA) applied to Genetic Programming has adopted a Stochastic Context Free Grammar(SCFG)-based model formalism. However these methods generate biases which may be indistinguishable from selection bias, resulting in sub-optimal performance. The primary factor generating this bias is the combined effect of recursion in the grammars and depth limitation removing some sample trees from the distribution. Here, we demonstrate the bias and provide exact estimates of its scale (assuming infinite populations and simple recursions). We define a quantity h which determines both whether bias occurs (h >; 1) and its scale. We apply this analysis to a number of simple illustrative grammars, and to a range of practically-used GP grammars, showing that this bias is both real and important.

[1]  N. McPhee,et al.  The Effects of Size and Depth Limits on Tree Based Genetic Programming , 2006 .

[2]  Hussein A. Abbass,et al.  Program evolution with explicit learning , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Ernesto Costa,et al.  Resource-Limited Genetic Programming: Replacing Tree Depth Limits , 2005 .

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Hussein A. Abbass,et al.  A Survey of Probabilistic Model Building Genetic Programming , 2006, Scalable Optimization via Probabilistic Modeling.

[8]  Peter A. N. Bosman,et al.  Learning Probabilistic Tree Grammars for Genetic Programming , 2004, PPSN.

[9]  Nicholas Freitag McPhee,et al.  On the Strength of Size Limits in Linear Genetic Programming , 2004, GECCO.

[10]  Daryl Essam,et al.  Modularity and position independence in EDA-GP , 2004 .

[11]  Hitoshi Iba,et al.  Latent Variable Model for Estimation of Distribution Algorithm Based on a Probabilistic Context-Free Grammar , 2009, IEEE Transactions on Evolutionary Computation.

[12]  Ivan Tanev,et al.  Implications of Incorporating Learning Probabilistic Context-sensitive Grammar in Genetic Programming on Evolvability of Adaptive Locomotion Gaits of Snakebot , 2004 .

[13]  Peter A. Whigham Inductive bias and genetic programming , 1995 .

[14]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[15]  Hussein A. Abbass,et al.  Grammar model-based program evolution , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[16]  Robert I. McKay,et al.  Stochastic Diversity Loss and Scalability in Estimation of Distribution Genetic Programming , 2013, IEEE Transactions on Evolutionary Computation.

[17]  Michèle Sebag,et al.  Avoiding the Bloat with Stochastic Grammar-Based Genetic Programming , 2001, Artificial Evolution.

[18]  Hitoshi Iba,et al.  Applied Genetic Programming and Machine Learning , 2009 .

[19]  Robert I. McKay,et al.  Sampling Bias in Estimation of Distribution Algorithms for Genetic Programming Using Prototype Trees , 2010, PRICAI.

[20]  H. Iba,et al.  Estimation of distribution programming based on Bayesian network , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..