Training an Agent to Ground Commands with Reward and Punishment

As robots and autonomous assistants becomemore capable, there will be agreater need for humans to easilyconvey to agents the complex tasks they wantthem to carry out. Conveying tasks throughnatural language provides an intuitive interfacethat does not require any technical expertise,but implementing such an interface requires methods forthe agent to learn a grounding of natural language commands.In this work, we demonstrate how high-level task groundings can belearned from a human trainer providing online reward and punishment.Grounding language to high-level tasks for the agent to solveremoves the need for the human to specify low-level solution details intheir command.Using reward and punishment for trainingmakes the training procedure simple enough to be used by people withouttechnical expertise and also allows a human trainer to immediatelycorrect errors in interpretation that the agent has made. We present preliminary results from a single usertraining an agent in a simple simulated home environment and showthat the agent can quickly learn a grounding oflanguage such that the agent can successfully interpretnew commands and executethem in a variety of different environments.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[5]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[6]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[7]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[8]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[9]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[10]  Luke S. Zettlemoyer,et al.  Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[11]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[12]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[13]  N. Roy,et al.  Imitation Learning for Natural Language Direction Following , 2011 .

[14]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[15]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[16]  Thomas J. Walsh,et al.  Teaching and executing verb phrases , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[17]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[18]  Stefanie Tellex,et al.  Learning perceptually grounded word meanings from unaligned parallel data , 2012, Machine Learning.

[19]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[20]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[21]  Smaranda Muresan,et al.  Translating English to Reward Functions , 2014 .