论文信息 - Interactively shaping robot behaviour with unlabeled human instructions

Interactively shaping robot behaviour with unlabeled human instructions

In this paper, we propose a framework that enables a human teacher to shape a robot behaviour by interactively providing it with unlabeled instructions. We ground the meaning of instruction signals in the task-learning process, and use them simultaneously for guiding the latter. We implement our framework as a modular architecture, named TICS (Task-Instruction-Contingency-Shaping) that combines different information sources: a predefined reward function, human evaluative feedback and unlabeled instructions. This approach provides a novel perspective for robotic task learning that lies between Reinforcement Learning and Supervised Learning paradigms. We evaluate our framework both in simulation and with a real robot. The experimental results demonstrate the effectiveness of our framework in accelerating the task-learning process and in reducing the number of required teaching signals.

[1] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[2] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[3] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[4] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[5] Balaraman Ravindran,et al. Instructing a Reinforcement Learner , 2012, FLAIRS.

[6] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[7] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[8] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[9] Daniel Kudenko,et al. Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[10] Tsuyoshi Murata,et al. {m , 1934, ACML.

[11] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[12] C. Breazeal,et al. Transparency and Socially Guided Machine Learning , 2006 .

[13] E. LESTER SMITH,et al. AND OTHERS , 2005 .

[14] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[15] Pierre-Yves Oudeyer,et al. Pragmatic Frames for Teaching and Learning in Human–Robot Interaction: Review and Challenges , 2016, Front. Neurorobot..

[16] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[17] Pierre-Yves Oudeyer,et al. Robot learning simultaneously a task and how to interpret human instructions , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[18] Christopher G. Atkeson,et al. Optimization based full body control for the atlas robot , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[19] David L. Roberts,et al. A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[20] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[21] Fiery Cushman,et al. Teaching with Rewards and Punishments: Reinforcement or Communication? , 2015, CogSci.

[22] Daniel Jurafsky,et al. Learning to Follow Navigational Directions , 2010, ACL.

[23] Stefan Wermter,et al. Interactive reinforcement learning through speech guidance in a domestic scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[24] Bhaskara Marthi,et al. Automatic shaping and decomposition of reward functions , 2007, ICML '07.

[25] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26] Sonia Chernova,et al. Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[27] Manuela M. Veloso,et al. Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[28] Phil Husbands,et al. Evolutionary robotics , 2014, Evolutionary Intelligence.

[29] Groupe Pdmia. Markov Decision Processes In Artificial Intelligence , 2009 .

[30] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[31] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[32] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .

[33] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[34] Monica N. Nicolescu,et al. Natural methods for robot task learning: instructive demonstrations, generalization and practice , 2003, AAMAS '03.

[35] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[36] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[37] Luke S. Zettlemoyer,et al. Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[38] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[39] Aude Billard,et al. Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[40] Pradyot V. N. Korupolu. Integrating Human Instructions and Reinforcement Learners : An SRL Approach , 2012 .

[41] Cynthia Breazeal,et al. Training a Robot via Human Feedback: A Case Study , 2013, ICSR.

[42] A. E. Eiben,et al. Evolutionary Robotics: What, Why, and Where to , 2015, Front. Robot. AI.

[43] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[44] Andrew G. Barto,et al. Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[45] Gillian M. Hayes,et al. Estimating Future Reward in Reinforcement Learning Animats using Associative Learning , 2004 .

[46] Andrea Lockerd Thomaz,et al. Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.

[47] W. Bradley Knox and Cynthia Breazeal and Peter Stone. Learning from feedback on actions past and intended , 2012 .

[48] Peter Stone,et al. A social reinforcement learning agent , 2001, AGENTS '01.

[49] C. Breazeal,et al. Robot learning via socially guided exploration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[50] Smaranda Muresan,et al. Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[51] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[52] Eduardo F. Morales,et al. Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[53] David L. Roberts,et al. Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[54] Patrick M. Pilarski,et al. Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning , 2016, ArXiv.

[55] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.

[56] Peter Stone,et al. Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.