论文信息 - Augmented Reinforcement Learning for Interaction with Non-expert Humans in Agent Domains

Augmented Reinforcement Learning for Interaction with Non-expert Humans in Agent Domains

In application domains characterized by dynamic changes and non-deterministic action outcomes, it is frequently difficult for agents or robots to operate without any human supervision. Although human feedback can help an agent learn a rich representation of the task and domain, humans may not have the expertise or time to provide elaborate and accurate feedback in complex domains. Widespread deployment of intelligent agents hence requires that the agents operate autonomously using sensory inputs and limited high-level feedback from non-expert human participants. Towards this objective, this paper describes an augmented reinforcement learning framework that combines bootstrap learning and reinforcement learning principles. In the absence of human feedback, the agent learns by interacting with the environment. When high-level human feedback is available, the agent robustly merges it with environmental feedback by incrementally revising the relative contributions of the feedback mechanisms to the action choice policy. The framework is evaluated in two simulated domains: Tetris and Keep away soccer.

Mohan Sridharan

[1] Brian Scassellati,et al. The Grand Challenges in Socially Assistive Robotics , 2007 .

[2] Ian R. Fasel,et al. Design Principles for Creating Human-Shapable Agents , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[3] Maja J. Mataric,et al. Robot motivator: Increasing user enjoyment and performance on a physical/cognitive task , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[4] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[5] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[6] Michael A. Goodrich,et al. Human-Robot Interaction: A Survey , 2008, Found. Trends Hum. Comput. Interact..

[7] W. Hockley. Analysis of response time distributions in the study of cognitive processes. , 1984 .

[8] Andrea Lockerd Thomaz,et al. Learning from human teachers with Socially Guided Exploration , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Manuela M. Veloso,et al. WiFi localization and navigation for autonomous indoor mobile robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[11] Jodi Forlizzi,et al. Service robots in the domestic environment: a study of the roomba vacuum in the home , 2006, HRI '06.

[12] Illah R. Nourbakhsh,et al. A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[13] Magdalena D. Bugajska,et al. Building a Multimodal Human-Robot Interface , 2001, IEEE Intell. Syst..

[14] Aude Billard,et al. Experiments in social robotics: grounding and use of communication in autonomous agents , 2000 .

[15] Stephanie Rosenthal,et al. An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[16] Sebastian Thrun,et al. Toward a Framework for Human-Robot Interaction , 2004, Hum. Comput. Interact..

[17] Maya Cakmak,et al. Exploiting social partners in robot learning , 2010, Auton. Robots.

[18] Andrea Lockerd Thomaz,et al. Using training regimens to teach expanding function approximators , 2010, AAMAS.

[19] Patrick G. Kenny,et al. A New Generation of Intelligent Virtual Patients for Clinical Training , 2010 .

[20] Aude Billard,et al. Experiments in Learning by Imitation - Grounding and Use of Communication in Robotic Agents , 1999, Adapt. Behav..

[21] Daniel H. Grollman,et al. Teaching Old Dogs New Tricks: Incremental Multimap Regression for Interactive Robot Learning from Demonstration , 2010 .

[22] Mamatha Aerolla,et al. Incorporating human and environmental feedback for robust performance in agent domains , 2011 .

[23] Olivier Buffet,et al. The factored policy-gradient planner , 2009, Artif. Intell..

[24] Brian Scassellati,et al. Socially assistive robotics [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[25] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[26] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[27] Mitsuo Kawato,et al. Single trial learning of external dynamics: What can the brain teach us about learning mechanisms? , 2007 .