Control-Tutored Reinforcement Learning

We introduce a control-tutored reinforcement learning (CTRL) algorithm. The idea is to enhance tabular learning algorithms so as to improve the exploration of the state-space, and substantially reduce learning times by leveraging some limited knowledge of the plant encoded into a tutoring model-based control strategy. We illustrate the benefits of our novel approach and its effectiveness by using the problem of controlling one or more agents to herd and contain within a goal region a set of target free-roving agents in the plane.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[3]  Giovanni Russo,et al.  Driving Reinforcement Learning with Models , 2020, IntelliSys.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[6]  Francesco Borrelli,et al.  Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework , 2016, IEEE Transactions on Automatic Control.

[7]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[8]  Francesco Borrelli,et al.  Repetitive learning model predictive control: An autonomous racing example , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[9]  D. Bertsekas Reinforcement Learning and Optimal ControlA Selective Overview , 2018 .

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[12]  Vali Derhami,et al.  Supervised fuzzy reinforcement learning for robot navigation , 2016, Appl. Soft Comput..

[13]  Mac Schwager,et al.  Controlling Noncooperative Herds with Robotic Herders , 2018, IEEE Transactions on Robotics.

[14]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[15]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[16]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[17]  Warren E. Dixon,et al.  Single Agent Herding of n-Agents: A Switched Systems Approach , 2017 .

[18]  Giacomo Albi,et al.  Invisible Control of Self-Organizing Agents Leaving Unknown Environments , 2015, SIAM J. Appl. Math..