Approaching the Symbol Grounding Problem with Probabilistic Graphical Models

n order for robots to engage in dialog with human teammates, they must have the ability to map between words in the language and aspects of the external world. A solution to this symbol grounding problem (Harnad, 1990) would enable a robot to interpret commands such as “Drive over to receiving and pick up the tire pallet.” In this article we describe several of our results that use probabilistic inference to address the symbol grounding problem. Our specific approach is to develop models that factor according to the linguistic structure of a command. We first describe an early result, a generative model that factors according to the sequential structure of language, and then discuss our new framework, generalized grounding graphs (G3). The G3 framework dynamically instantiates a probabilistic graphical model for a natural language input, enabling a mapping between words in language and concrete objects, places, paths and events in the external world. We report on corpus-based experiments where the robot is able to learn and use word meanings in three real-world tasks: indoor navigation, spatial language video retrieval, and mobile manipulation.

[1]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[2]  Jerome A. Feldman,et al.  When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs , 1997 .

[3]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[5]  Mary O'Brien,et al.  Learning to Win , 1999 .

[6]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[7]  Stefanie Tellex,et al.  The Human Speechome Project , 2006, EELC.

[8]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[9]  Terrance Philip Regier,et al.  The acquisition of lexical semantics for spatial terms: a connectionist model of perceptual categorization , 1992 .

[10]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .

[11]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[12]  Hadas Kress-Gazit,et al.  Translating Structured English to Robot Controllers , 2008, Adv. Robotics.

[13]  Benjamin Kuipers,et al.  Autonomous Development of a Grounded Object Ontology by a Learning Robot , 2007, AAAI.

[14]  Deb Roy,et al.  Semiotic schemas: A framework for grounding language in action and perception , 2005, Artif. Intell..

[15]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[16]  Deb Roy,et al.  Conversational Robots: Building Blocks for Grounding Word Meaning , 2003, HLT-NAACL 2003.

[17]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[18]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[19]  Stefanie Tellex,et al.  A Discriminative Model for Understanding Natural Language Route Directions , 2010, AAAI Fall Symposium: Dialog with Robots.

[20]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[21]  Stefanie Tellex,et al.  Grounding Verbs of Motion in Natural Language Commands to Robots , 2010, ISER.

[22]  SugitaYuuya,et al.  Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes , 2005 .

[23]  Angelo Cangelosi,et al.  Grounding Action Words in the Sensorimotor Interaction with the World: Experiments with a Simulated iCub Humanoid Robot , 2010, Front. Neurorobot..

[24]  Regina Barzilay,et al.  Learning to Win by Reading Manuals in a Monte-Carlo Framework , 2011, ACL.

[25]  Deb Roy,et al.  Coupling perception and simulation: steps towards conversational robotics , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[26]  Stefanie Tellex,et al.  Grounding spatial language for video search , 2010, ICMI-MLMI '10.

[27]  Nicholas Roy,et al.  Utilizing object-object and object-scene context when planning to find things , 2009, 2009 IEEE International Conference on Robotics and Automation.

[28]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[29]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[30]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[31]  Ray Jackendoff Semantics and Cognition , 1983 .

[32]  Jun Tani,et al.  Learning Semantic Combinatoriality from the Interaction between Linguistic and Behavioral Processes , 2005, Adapt. Behav..

[33]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.