An Evolutionary Algorithm for Error-Driven Learning via Reinforcement

Although different learning systems are coordinated to afford complex behavior, little is known about how this occurs. This article describes a theoretical framework that specifies how complex behaviors that might be thought to require error-driven learning might instead be acquired through simple reinforcement. This framework includes specific assumptions about the mechanisms that contribute to the evolution of (artificial) neural networks to generate topologies that allow the networks to learn large-scale complex problems using only information about the quality of their performance. The practical and theoretical implications of the framework are discussed, as are possible biological analogs of the approach.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[3]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[4]  E. Thorndike “Animal Intelligence” , 1898, Nature.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Gerald Tesauro,et al.  Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[7]  L. Squire Declarative and Nondeclarative Memory: Multiple Brain Systems Supporting Learning and Memory , 1992, Journal of Cognitive Neuroscience.

[8]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9]  Erik D. Reichle,et al.  The emergence of adaptive eye movements in reading , 2010 .

[10]  Rajesh P. N. Rao,et al.  Self–organizing neural systems based on predictive learning , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[11]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[12]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[13]  Christian Igel,et al.  Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.

[14]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[15]  T. Sejnowski,et al.  The predictive brain: temporal coincidence and temporal order in synaptic learning mechanisms. , 1994, Learning & memory.

[16]  C. Morgan Animal Intelligence: An Experimental Study , 1898, Nature.

[17]  Jonathan D. Cohen,et al.  Learning to selectively attend , 2010 .

[18]  T. Sejnowski,et al.  Natural patterns of activity and long-term synaptic plasticity , 2000, Current Opinion in Neurobiology.

[19]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[20]  T. Sejnowski,et al.  Network Oscillations: Emerging Computational Principles , 2006, The Journal of Neuroscience.

[21]  Erik D. Reichle,et al.  Using reinforcement learning to understand the emergence of "intelligent" eye-movement behavior during reading. , 2006, Psychological review.

[22]  N. H. C. Yung,et al.  A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Erik D. Reichle,et al.  The Emergence of Adaptive Eye - Movement Control in Reading: Theory and Data , 2011 .

[24]  Y. Niv Reinforcement learning in the brain , 2009 .

[25]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[26]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[27]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[28]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[29]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[30]  B. McNaughton,et al.  Declarative memory consolidation in humans: a prospective functional magnetic resonance imaging study. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[32]  G. Lawton Why do we sleep? , 2000, Nature Neuroscience.

[33]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[34]  James L. McClelland,et al.  A distributed, developmental model of word recognition and naming. , 1989, Psychological review.

[35]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[36]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[37]  Nikolaus Hansen,et al.  Evaluating the CMA Evolution Strategy on Multimodal Test Functions , 2004, PPSN.