Neuroevolutionary reinforcement learning for generalized control of simulated helicopters

This article presents an extended case study in the application of neuroevolution to generalized simulated helicopter hovering, an important challenge problem for reinforcement learning. While neuroevolution is well suited to coping with the domain’s complex transition dynamics and high-dimensional state and action spaces, the need to explore efficiently and learn on-line poses unusual challenges. We propose and evaluate several methods for three increasingly challenging variations of the task, including the method that won first place in the 2008 Reinforcement Learning Competition. The results demonstrate that (1) neuroevolution can be effective for complex on-line reinforcement learning tasks such as generalized helicopter hovering, (2) neuroevolution excels at finding effective helicopter hovering policies but not at learning helicopter models, (3) due to the difficulty of learning reliable models, model-based approaches to helicopter hovering are feasible only when domain expertise is available to aid the design of a suitable model representation and (4) recent advances in efficient resampling can enable neuroevolution to tackle more aggressively generalized reinforcement learning tasks.

[1]  R. Bellman A Markovian Decision Process , 1957 .

[2]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  David E. Goldberg,et al.  Genetic Algorithms and the Variance of Fitness , 1991, Complex Syst..

[5]  Kalyanmoy Deb,et al.  Genetic Algorithms, Noise, and the Sizing of Populations , 1992, Complex Syst..

[6]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[7]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[8]  Luc Steels,et al.  Emergent functionality in robotic agents through on-line evolution , 1994 .

[9]  Bryant A. Julstrom,et al.  Seeding the population: improved performance in a genetic algorithm for the rectilinear Steiner problem , 1993, SAC '94.

[10]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[11]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[12]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[13]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[14]  Dekun Yang,et al.  Evolutionary algorithms with a coarse-to-fine function smoothing , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[17]  Francesco Mondada,et al.  Evolution of homing navigation in a real mobile robot , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[18]  Peter Nordin,et al.  An On-Line Method to Evolve Behavior and to Control a Miniature Robot in Real Time with Genetic Programming , 1996, Adapt. Behav..

[19]  Riccardo Poli,et al.  Genetic Programming with User-Driven Selection : Experiments on the Evolution of Algorithms for Image Enhancement , 1997 .

[20]  J. Bonet Recent developments in the incremental flow formulation for the numerical simulation of metal forming processes , 1998 .

[21]  Peter Stagge,et al.  Averaging Efficiently in the Presence of Noise , 1998, PPSN.

[22]  Inman Harvey,et al.  Evolutionary Robotics: A Survey of Applications and Problems , 1998, EvoRobot.

[23]  Pasquale Ponterosso,et al.  Heuristically Seeded Genetic Algorithms Applied to Truss Optimisation , 1999, Engineering with Computers.

[24]  X. Yao Evolving Artificial Neural Networks , 1999 .

[25]  David E. Goldberg,et al.  The Gambler's Ruin Problem, Genetic Algorithms, and the Sizing of Populations , 1999, Evolutionary Computation.

[26]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[27]  Sheng Chen,et al.  Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks , 1999, IEEE Trans. Neural Networks.

[28]  M. Colombetti,et al.  An extension to the XCS classifier system for stochastic environments , 1999 .

[29]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[30]  Peter J. Fleming,et al.  On-line evolution of robust control systems: an industrial active magnetic bearing application , 2001 .

[31]  Dario Floreano,et al.  Evolution of Plastic Control Networks , 2001, Auton. Robots.

[32]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[33]  Stewart W. Wilson Function approximation with a classifier system , 2001 .

[34]  Dario Floreano,et al.  Evolving Vision-Based Flying Robots , 2002, Biologically Motivated Computer Vision.

[35]  Horatiu Voicu Evolutionary Robotics: A Review , 2002, AI Mag..

[36]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[37]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[38]  Sandor Markon,et al.  Threshold selection, hypothesis tests, and DOE methods , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[39]  Bernhard Sendhoff,et al.  A framework for evolutionary optimization with approximate fitness functions , 2002, IEEE Trans. Evol. Comput..

[40]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[41]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[42]  Jürgen Branke,et al.  Selection in the Presence of Noise , 2003, GECCO.

[43]  A. Keane,et al.  Evolutionary Optimization of Computationally Expensive Problems via Surrogate Modeling , 2003 .

[44]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[45]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[46]  Christine A. Shoemaker,et al.  Local function approximation in evolutionary algorithms for the optimization of costly functions , 2004, IEEE Transactions on Evolutionary Computation.

[47]  Andrew W. Moore,et al.  The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[48]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[49]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[50]  Jürgen Branke,et al.  Sequential Sampling in Noisy Environments , 2004, PPSN.

[51]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[52]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[53]  Gerald Sommer,et al.  Efficient reinforcement learning through Evolutionary Acquisition of Neural Topologies , 2005, ESANN.

[54]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[55]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[56]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[57]  Risto Miikkulainen,et al.  Evolving Soccer Keepaway Players Through Task Decomposition , 2005, Machine Learning.

[58]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[59]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[60]  Martin V. Butz,et al.  Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[61]  Yaochu Jin,et al.  A comprehensive survey of fitness approximation in evolutionary computation , 2005, Soft Comput..

[62]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[63]  Pieter Abbeel,et al.  Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[64]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[65]  Lihong Li,et al.  PAC model-free reinforcement learning , 2006, ICML.

[66]  Shimon Whiteson,et al.  Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..

[67]  Hod Lipson,et al.  Actively probing and modeling users in interactive coevolution , 2006, GECCO.

[68]  David E. Goldberg,et al.  Evaluation relaxation using substructural information and linear estimation , 2006, GECCO '06.

[69]  Larry Bull,et al.  A Neural Learning Classifier System with Self-Adaptive Constructivism for Mobile Robot Control , 2006, Artificial Life.

[70]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[71]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[72]  Owen Holland,et al.  UltraSwarm: A Further Step Towards a Flock of Miniature Helicopters , 2006, Swarm Robotics.

[73]  Shimon Whiteson,et al.  On-line evolutionary computation for reinforcement learning in stochastic domains , 2006, GECCO.

[74]  Tamar Frankel [The theory and the practice...]. , 2001, Tijdschrift voor diergeneeskunde.

[75]  Gerald Sommer,et al.  Evolutionary reinforcement learning of artificial neural networks , 2007, Int. J. Hybrid Intell. Syst..

[76]  Thomas Hofmann,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2007 .

[77]  Stewart W. Wilson,et al.  Learning classifier systems: a survey , 2007, Soft Comput..

[78]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[79]  Claire J. Tomlin,et al.  Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment , 2007 .

[80]  Steffen Priesterjahn,et al.  Real-time imitation-based adaptation of gaming behaviour in modern computer games , 2008, GECCO '08.

[81]  Kenneth O. Stanley,et al.  A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[82]  Carlos A. Coello Coello,et al.  Seeding the initial population of a multi-objective evolutionary algorithm using gradient-based information , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[83]  Owen Holland,et al.  Coevolutionary Modelling of a Miniature Rotorcraft. , 2008 .

[84]  Arthur Tay,et al.  Online adaptive controller for simulated car racing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[85]  Hod Lipson,et al.  Coevolution of Fitness Predictors , 2008, IEEE Transactions on Evolutionary Computation.

[86]  Oliver Purwin,et al.  Performing aggressive maneuvers using iterative learning control , 2009, 2009 IEEE International Conference on Robotics and Automation.

[87]  Shimon Whiteson,et al.  Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .

[88]  Peter Stone,et al.  An empirical analysis of value function-based and policy search reinforcement learning , 2009, AAMAS.

[89]  Daniele Loiacono,et al.  On-line neuroevolution applied to The Open Racing Car Simulator , 2009, 2009 IEEE Congress on Evolutionary Computation.

[90]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[91]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[92]  H. JoséAntonioMartín,et al.  Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning , 2009, International Conference/Workshop on Computer Aided Systems Theory.

[93]  Shimon Whiteson,et al.  Neuroevolutionary reinforcement learning for generalized helicopter control , 2009, GECCO.

[94]  Serge Kernbach,et al.  Evolutionary robotics: The next-generation-platform for on-line and on-board artificial evolution , 2009, 2009 IEEE Congress on Evolutionary Computation.

[95]  Javier de Lope,et al.  Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning , 2009 .

[96]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[97]  Kenneth O. Stanley A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[98]  Shimon Whiteson,et al.  Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning , 2010, Autonomous Agents and Multi-Agent Systems.

[99]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[100]  Daniele Loiacono,et al.  Learning to Drive in the Open Racing Car Simulator Using Online Neuroevolution , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[101]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[102]  Peter Stone,et al.  Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.

[103]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .

[104]  Pieter Abbeel,et al.  Parameterized maneuver learning for autonomous helicopter flight , 2010, 2010 IEEE International Conference on Robotics and Automation.

[105]  Dongbing Gu,et al.  A Behaviour Based Control System for Surveillance UAVs , 2010 .

[106]  Raffaello D'Andrea,et al.  A simple learning strategy for high-speed quadrocopter multi-flips , 2010, 2010 IEEE International Conference on Robotics and Automation.

[107]  Shimon Whiteson,et al.  Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[108]  Ahmed Syed Irshad,et al.  Markov Decision Process , 2011 .