Reinforcement learning in feedback control

Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches—in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire high-quality control behaviour from scratch.This article focuses on the presentation of four typical benchmark problems whilst highlighting important and challenging aspects of technical process control: nonlinear dynamics; varying set-points; long-term dynamic effects; influence of external variables; and the primacy of precision. We propose performance measures for controller quality that apply both to classical control design and learning controllers, measuring precision, speed, and stability of the controller. A second set of key-figures describes the performance from the perspective of a learning approach while providing information about the efficiency of the method with respect to the learning effort needed. For all four benchmark problems, extensive and detailed information is provided with which to carry out the evaluations outlined in this article.A close evaluation of our own RL learning scheme, NFQCA (Neural Fitted Q Iteration with Continuous Actions), in acordance with the proposed scheme on all four benchmarks, thereby provides performance figures on both control quality and learning behavior.

[1]  D.G. Dudley,et al.  Dynamic system identification experiment design and data analysis , 1979, Proceedings of the IEEE.

[2]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[5]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[6]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[7]  Wolfram Schiffmann,et al.  Comparison of optimized backpropagation algorithms , 1993, ESANN.

[8]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[9]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[10]  Richard S. Sutton,et al.  Challenging Control Problems , 1995 .

[11]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[12]  Lennart Ljung,et al.  Nonlinear black-box modeling in system identification: a unified overview , 1995, Autom..

[13]  Douglas C. Hittle,et al.  Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil , 1997, Artificial Intelligence in Engineering.

[14]  Zhihua Qu,et al.  Nonlinear autopilot control design for a 2-DOF helicopter model , 1997 .

[15]  Frank L. Lewis,et al.  ROBUST NEURAL NETWORK CONTROL OF RIGID LINK FLEXIBLE‐JOINT ROBOTS , 1999 .

[16]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[17]  Steven Seidman,et al.  A synthesis of reinforcement learning and robust control theory , 2000 .

[18]  Fernando Paganini,et al.  A Course in Robust Control Theory , 2000 .

[19]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[20]  John Kaneshige,et al.  Intelligent Control Approaches for Aircraft Applications , 2001 .

[21]  Zi-Jiang Yang,et al.  Robust Nonlinear Control of a Feedback Linearizable Voltage-Controlled Magnetic Levitation System , 2001 .

[22]  O. Nelles Nonlinear System Identification , 2001 .

[23]  Joseph Z. Lu Challenging control problems and emerging technologies in enterprise optimization , 2001 .

[24]  Zi-Jiang Yang,et al.  Adaptive robust nonlinear control of a magnetic levitation system , 2001, Autom..

[25]  Zi-Jiang Yang,et al.  Adaptive robust output feedback control of a magnetic levitation system by k-filter approach , 2004, Proceedings of the 2004 IEEE International Symposium on Intelligent Control, 2004..

[26]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[27]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[28]  Rajarshi Das,et al.  A multi-agent systems approach to autonomic computing , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[29]  Zi-Jiang Yang,et al.  Adaptive robust nonlinear control of a magnetic levitation system via DSC technique , 2004 .

[30]  Yiheng Xu,et al.  Successful application of residual gas analysis , 2004 .

[31]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[32]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[33]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[34]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  J. Farrell,et al.  Adaptive Approximation Based Control: General Theory , 2006 .

[36]  Zi-Jiang Yang,et al.  Robust nonlinear control of a voltage-controlled magnetic levitation system using disturbance observer , 2007 .

[37]  Martin A. Riedmiller,et al.  Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[38]  Martin A. Riedmiller,et al.  Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[39]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[40]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Martin A. Riedmiller,et al.  ADAPTIVE REACTIVE JOB-SHOP SCHEDULING WITH REINFORCEMENT LEARNING AGENTS , 2008 .

[42]  Marc Carreras,et al.  Policy gradient based Reinforcement Learning for real autonomous underwater cable tracking , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Roland Hafner Dateneffiziente selbstlernende neuronale Regler , 2009 .

[44]  Brian Tanner,et al.  RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[45]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[46]  Alina Voda,et al.  Modeling and robust control of Blu-ray disc servo-mechanisms , 2009 .

[47]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[48]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .