论文信息 - Reinforcement Learning for Mapping Instructions to Actions

Reinforcement Learning for Mapping Instructions to Actions

In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our method to interpret instructions in two domains --- Windows troubleshooting guides and game tutorials. Our results demonstrate that this method can rival supervised learning techniques while requiring few or no annotated training examples.

[1] Terry Winograd,et al. Understanding natural language , 1974 .

[2] Barbara Di Eugenio,et al. Understanding Natural Language Instructions: The Case of Purpose Clauses , 1992, ACL.

[3] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[5] SiskindJeffrey Mark,et al. Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic , 1999 .

[6] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8] Joelle Pineau,et al. Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[9] Marilyn A. Walker,et al. Automatic Optimization of Dialogue Management , 2000, COLING.

[10] Paul R. Cohen,et al. Grounding knowledge in sensors: unsupervised learning for language and planning , 2001 .

[11] David A. Forsyth,et al. Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12] Jeffrey Mark Siskind,et al. Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..

[13] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14] Alex Pentland,et al. Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[15] Steve Young,et al. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[16] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[17] Chen Yu,et al. On the Integration of Grounding Language and Learning Objects , 2004, AAAI.

[18] Deb Roy,et al. Intentional Context in Situated Natural Language Learning , 2005, CoNLL.

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[21] Raymond J. Mooney,et al. Learning to Connect Language and Perception , 2008, AAAI.

[22] Raymond J. Mooney. Learning Language from Its Perceptual Context , 2011, PADL.