Toward Interpreting Spatial Language Discourse with Grounding Graphs

In order to interact naturally with humans, robots must be able to engage in natural language dialog with their human teammate. To do so they need to model not only the state of the world but also the state of the interaction with the human user. In this paper we present an algorithm for understanding dialog in which information from multiple di alog turns is dynamically integrated into a probabilistic graphical model. Inference in the joint model enables the robot to find groundings in the world that correspond to the language across multiple dialog acts. To make inference efficient we model th e interaction state as a probabilistic model called Generali zed Grounding Graphs (G), which is dynamically instantiated from the natural language according to the hierarchical and compositional semantic structure of the utterances. Our previ ous work showed that a G model can find and execute plans corresponding to a single natural language command for mobi le manipulation. This paper extends the G model to support multi-turn dialog interaction and demonstrates its application on several examples; a full implementation and evaluation remain future work. Unifying the representations of a single command and a dialogue allows the training and inference algorithmsto be the same for both problems.

[1]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[2]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[3]  Dong Seok Kim On the Typology of Wh-Questions , 1999 .

[4]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[5]  Nicholas Roy,et al.  Spoken language interaction with model uncertainty: an adaptive human–robot interaction system , 2008, Connect. Sci..

[6]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[7]  Joelle Pineau,et al.  Spoken Dialog Management for Robots , 2000, ACL 2000.

[8]  Alex Lascarides,et al.  Segmented Discourse Representation Theory: Dynamic Semantics With Discourse Structure , 2008 .

[9]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[10]  Stefanie Tellex,et al.  Object schemas for grounding language in a responsive robot , 2008, Connect. Sci..

[11]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[12]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[13]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.