An Evolutionary Approach to Automatic Construction of the Structure in Hierarchical Reinforcement Learning

We plan to implement our method to the real hardware. A foreseeable extension of this study is to generalize the method as a model of cooperative and competitive mechanisms of the learning modules in the brain.

[1]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[2]  Anders Eriksson TRITA-NA-Eyynn Evolution of Meta-parameters in Reinforcement Learning , 2003 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5]  Martin C. Martin,et al.  Visual obstacle avoidance using genetic programming: first results , 2001 .

[6]  M. B. Tahir,et al.  An overview of Genetic Algorithms , 2003 .

[7]  K. Downing Adaptive genetic programs via reinforcement learning , 2001 .

[8]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[9]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[12]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[13]  Minoru Asada,et al.  Cooperative and competitive behavior acquisition for mobile robots through co-evolution , 1999 .

[14]  Peter Nordin,et al.  An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming , 1996, Adapt. Behav..

[15]  Gregory S. Hornby,et al.  Autonomous evolution of gaits with the Sony Quadruped Robot , 1999 .

[16]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[17]  J. Galletly An Overview of Genetic Algorithms , 1992 .

[18]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[19]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[20]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[21]  Thomas Bäck,et al.  Evolutionary computation: an overview , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.