The Challenges of Reinforcement Learning in Robotics and Optimal Control

Reinforcement Learning (RL) is an emerging technology for designing control systems that find optimal policy, through simulated or actual experience, according to a performance measure given by the designer. This paper discusses a widely used RL algorithm called Q-learning. This paper discuss how to apply these algorithms to robotics and optimal control systems, where several key challenges must be addressed for it to be useful. We discuss how Q-learning algorithm can adapted to work in continuous states and action spaces, the methods for computing rewards which generates an adaptive optimal controller and accelerate learning process and finally the safe exploration approaches.

[1]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[2]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[3]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[4]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[5]  Xi-Ren Cao,et al.  Stochastic learning and optimization - A sensitivity-based approach , 2007, Annu. Rev. Control..

[6]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[7]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[8]  Iftekhar Ahmad,et al.  Application of artificial intelligence to improve quality of service in computer networks , 2011, Neural Computing and Applications.

[9]  Peter Stone,et al.  RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for robot control , 2011, 2012 IEEE International Conference on Robotics and Automation.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[12]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[13]  Christian Igel,et al.  Reinforcement learning in a nutshell , 2007, ESANN.

[14]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[15]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[17]  E. Nadaraya On Estimating Regression , 1964 .

[18]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[19]  Ben J. A. Kröse,et al.  Neural Q-learning , 2003, Neural Computing & Applications.

[20]  Hado van Hasselt,et al.  Reinforcement Learning in Continuous State and Action Spaces , 2012, Reinforcement Learning.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[23]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[24]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[25]  S. W. Carden,et al.  Convergence of a Q-learning Variant for Continuous States and Actions , 2014, J. Artif. Intell. Res..

[26]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[27]  Shubhendu Bhasin,et al.  Reinforcement learning and optimal control methods for uncertain nonlinear systems , 2011 .

[28]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[29]  Richard S. Sutton,et al.  Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.

[30]  Chris Gaskett,et al.  Q-Learning for Robot Control , 2002 .

[31]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[32]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[33]  Andrew G. Barto,et al.  Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[34]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[35]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[36]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[37]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[38]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[39]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[40]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[41]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[42]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[43]  B. Anderson,et al.  Optimal control: linear quadratic methods , 1990 .

[44]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..