10 Steps and Some Tricks to Set up Neural Reinforcement Controllers

The paper discusses the steps necessary to set up a neural reinforcement controller for successfully solving typical (real world) control tasks. The major intention is to provide a code of practice of crucial steps that show how to transform control task requirements into the specification of a reinforcement learning task. Thereby, we do not necessarily claim that the way we propose is the only one (this would require a lot of empirical work, which is beyond the scope of the paper), but wherever possible we try to provide insights why we do it the one way or the other. Our procedure of setting up a neural reinforcement learning system worked well for a large range of real, realistic or benchmark-style control applications.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Vol. II , 1976 .

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[4]  Ivan Bratko,et al.  Learning to control dynamic systems , 1995 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[7]  Martin A. Riedmiller Learning to Control Dynamic Systems , 1996 .

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[10]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[11]  Martin A. Riedmiller,et al.  Reinforcement learning on an omnidirectional mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12]  Ieee Robotics Proceedings, 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003), October 27-31,2003, Las Vegas, Nevada , 2003 .

[13]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[14]  Martin A. Riedmiller Neural reinforcement learning to swing-up and balance a real pole , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[16]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[17]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[18]  Martin A. Riedmiller,et al.  On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[19]  Thomas J. Walsh,et al.  Planning and Learning in Environments with Delayed Feedback , 2007, ECML.

[20]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[21]  Martin A. Riedmiller,et al.  Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[22]  S. Timmer,et al.  Fitted Q Iteration with CMACs , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[23]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[24]  Martin A. Riedmiller,et al.  ADAPTIVE REACTIVE JOB-SHOP SCHEDULING WITH REINFORCEMENT LEARNING AGENTS , 2008 .

[25]  Martin Lauer,et al.  Learning to dribble on a real robot by success and failure , 2008, 2008 IEEE International Conference on Robotics and Automation.

[26]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[27]  Martin A. Riedmiller,et al.  The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting , 2009, 2009 International Conference on Machine Learning and Applications.

[28]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[29]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[30]  Martin A. Riedmiller,et al.  Deep learning of visual control policies , 2010, ESANN.

[31]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[32]  Martin A. Riedmiller,et al.  Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[33]  Martin A. Riedmiller,et al.  Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[34]  Martin A. Riedmiller,et al.  Distributed policy search reinforcement learning for job-shop scheduling tasks , 2012 .

[35]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[36]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[37]  Martin A. Riedmiller,et al.  A learned feature descriptor for object recognition in RGB-D data , 2012, 2012 IEEE International Conference on Robotics and Automation.