Reinforced Genetic Programming

This paper introduces the Reinforced Genetic Programming (RGP) system, which enhances standard tree-based genetic programming (GP) with reinforcement learning (RL). RGP adds a new element to the GP function set: monitored action-selection points that provide hooks to a reinforcement-learning system. Using strong typing, RGP can restrict these choice points to leaf nodes, thereby turning GP trees into classify-and-act procedures. Then, environmental reinforcements channeled back through the choice points provide the basis for both lifetime learning and general GP fitness assessment. This paves the way for evolutionary acceleration via both Baldwinian and Lamarckian mechanisms. In addition, the hybrid hints of potential improvements to RL by exploiting evolution to design proper abstraction spaces, via the problem-state classifications of the internal tree nodes. This paper details the basic mechanisms of RGP and demonstrates its application on a series of static and dynamic maze-search problems.

[1]  J. Baldwin A New Factor in Evolution (Continued) , 1896, The American Naturalist.

[2]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[3]  Rick L. Riolo,et al.  Bucket Brigade Performance: I. Long Sequences of Classifiers , 1987, ICGA.

[4]  Geoffrey E. Hinton,et al.  How Learning Can Guide Evolution , 1996, Complex Syst..

[5]  David E. Goldberg,et al.  A Critical Review of Classifier Systems , 1989, ICGA.

[6]  Peter M. Todd,et al.  Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[7]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[8]  Kenneth de Jong,et al.  Genetic-algorithm-based learning , 1990 .

[9]  David H. Ackley,et al.  Interactions between learning and evolution , 1991 .

[10]  Jean-Arcady Meyer,et al.  Lookahead Planning and Latent Learning in a Classifier System , 1991 .

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  L. Darrell Whitley,et al.  Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[13]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[14]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[15]  Jean Baptiste Pierre Antoine de Monet sw Lamarck Of the influence of the environment on the activities and habits of animals, and the influence of the activities and habits of these living bodies in modifying their organization and structure , 1996 .

[16]  Peter D. Turney,et al.  Evolution, Learning, and Instinct: 100 Years of the Baldwin Effect , 1996, Evolutionary Computation.

[17]  Giles Mayley,et al.  Landscapes, Learning Costs, and Genetic Assimilation , 1996, Evolutionary Computation.

[18]  James R. Wilson,et al.  Empirical Investigation of the Benefits of Partial Lamarckianism , 1997, Evolutionary Computation.

[19]  L. Yaeger Computational Genetics, Physiology, Metabolism, Neural Systems, Learning, Vision, and Behavior or PolyWorld: Life in a New Context , 1997 .

[20]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[21]  Astro Teller The internal reinforcement of evolving algorithms , 1999 .

[22]  Stewart W. Wilson,et al.  Toward Optimal Classifier System Performance in Non-Markov Environments , 2000, Evolutionary Computation.

[23]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[24]  G. Robertson,et al.  A Tale of Two Classifier Systems , 2005, Machine Learning.

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.