Teaching agents with corrective human feedback for challenging problems

COACH (COrrective Advice Communicated by Humans) is an interactive learning framework that allows non-expert humans to shape a policy through corrective advice, using a binary signal in the action domain of the agent. The original COACH formulation has been tested in problems of one-dimensional actions spaces with RBF linear models for the policy approximation. In this paper the COACH framework is tested with two more complex learning problems of more than one dimension in the action domain such as learning to drive a bicycle, and ball-dribbling with humanoid robots. Moreover, for the second problem, the COACH's principles are extended for training a decision-making system using a fuzzy based policy approximation. In these two problems the performance of COACH is compared with the one of other learning methods, obtaining better results. Results show that COACH is able to transfer successfully human knowledge to agents with multi-dimensional continuous action domains based on the use of different kind of models.

[1]  Javier Ruiz-del-Solar,et al.  Interactive Learning of Continuous Actions from Corrective Advice Communicated by Humans , 2015, RoboCup.

[2]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[4]  TaeChoong Chung,et al.  Learning via human feedback in continuous state and action spaces , 2013, Applied Intelligence.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Carlos Celemin,et al.  COACH: Learning continuous actions from COrrective Advice Communicated by Humans , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[7]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[8]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[9]  Javier Ruiz-del-Solar,et al.  Ball Dribbling for Humanoid Biped Robots: A Reinforcement Learning and Fuzzy Control Approach , 2014, RoboCup.

[10]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.