Reinforcement Learning for Mapping Instructions to Actions

In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our method to interpret instructions in two domains --- Windows troubleshooting guides and game tutorials. Our results demonstrate that this method can rival supervised learning techniques while requiring few or no annotated training examples.

[1]  Terry Winograd,et al.  Understanding natural language , 1974 .

[2]  Barbara Di Eugenio,et al.  Understanding Natural Language Instructions: The Case of Purpose Clauses , 1992, ACL.

[3]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  SiskindJeffrey Mark,et al.  Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic , 1999 .

[6]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[9]  Marilyn A. Walker,et al.  Automatic Optimization of Dialogue Management , 2000, COLING.

[10]  Paul R. Cohen,et al.  Grounding knowledge in sensors: unsupervised learning for language and planning , 2001 .

[11]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Jeffrey Mark Siskind,et al.  Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[15]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[16]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[17]  Chen Yu,et al.  On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[18]  Deb Roy,et al.  Intentional Context in Situated Natural Language Learning , 2005, CoNLL.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[21]  Raymond J. Mooney,et al.  Learning to Connect Language and Perception , 2008, AAAI.

[22]  Raymond J. Mooney Learning Language from Its Perceptual Context , 2011, PADL.