Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms

Our goal is to develop models that allow a robot to efficiently understand or “ground” natural language instructions in the context of its world representation. Contemporary approaches estimate correspondences between language instructions and possible groundings such as objects, regions, and goals for actions that the robot should execute. However, these approaches typically reason in relatively small domains and do not model abstract spatial concepts such as as “rows,” “columns,” or “groups” of objects and, hence, are unable to interpret an instruction such as “pick up the middle block in the row of five blocks.” In this paper, we introduce two new models for efficient natural language understanding of robot instructions. The first model, which we call the adaptive distributed correspondence graph (ADCG), is a probabilistic model for interpreting abstract concepts that require hierarchical reasoning over constituent concrete entities as well as notions of cardinality and ordinality. Abstract grounding variables form a Markov boundary over concrete groundings, effectively de-correlating them from the remaining variables in the graph. This structure reduces the complexity of model training and inference. Inference in the model is posed as an approximate search procedure that orders factor computation such that the estimated probable concrete groundings focus the search for abstract concepts towards likely hypothesis, pruning away improbable portions of the exponentially large space of abstractions. Further, we address the issue of scalability to complex domains and introduce a hierarchical extension to a second model termed the hierarchical adaptive distributed correspondence graph (HADCG). The model utilizes the abstractions in the ADCG but infers a coarse symbolic structure from the utterance and the environment model and then performs fine-grained inference over the reduced graphical model, further improving the efficiency of inference. Empirical evaluation demonstrates accurate grounding of abstract concepts embedded in complex natural language instructions commanding a robotic torso and a mobile robot. Further, the proposed approximate inference method allows significant efficiency gains compared with the baseline, with minimal trade-off in accuracy.

[1]  Alessandro Saffiotti,et al.  An introduction to the anchoring problem , 2003, Robotics Auton. Syst..

[2]  Hadas Kress-Gazit,et al.  A model for verifiable grounding and execution of complex natural language instructions , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[4]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[5]  Matthew R. Walter,et al.  On the performance of hierarchical distributed correspondence graphs for efficient symbol grounding of robot instructions , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Anthony G. Cohn,et al.  Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video , 2016, KR.

[7]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[8]  Martial Hebert,et al.  Integrated Intelligence for Human-Robot Teams , 2016, ISER.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[11]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[12]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[13]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[14]  Brenna Argall,et al.  Real-time natural language corrections for assistive robotic manipulators , 2017, Int. J. Robotics Res..

[15]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[16]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[17]  Manuel Lopes,et al.  Active Learning for Teaching a Robot Grounded Relational Symbols , 2013, IJCAI.

[18]  Dan Klein,et al.  Grounding spatial relations for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  John E. Laird,et al.  Situated comprehension of imperative sentences in embodied, cognitive agents , 2012, AAAI 2012.

[20]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[21]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[22]  Jean Oh,et al.  Toward Mobile Robots Reasoning Like Humans , 2015, AAAI.

[23]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[24]  Crystal Chao,et al.  Transparent active learning for robots , 2010, HRI 2010.

[25]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[26]  Raymond J. Mooney,et al.  A Statistical Semantic Parser that Integrates Syntax and Semantics , 2005, CoNLL.

[27]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[28]  Maya Cakmak,et al.  Situated Language Understanding with Human-like and Visualization-Based Transparency , 2016, Robotics: Science and Systems.

[29]  Irene Heim,et al.  Semantics in generative grammar , 1998 .

[30]  Stefanie Tellex,et al.  Grounding Verbs of Motion in Natural Language Commands to Robots , 2010, ISER.

[31]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[32]  Jean Oh,et al.  Inferring Maps and Behaviors from Natural Language Instructions , 2015, ISER.

[33]  Ben Taskar,et al.  An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[34]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[35]  Matthias Scheutz,et al.  Robust spoken instruction understanding for HRI , 2010, HRI 2010.

[36]  Jeffrey Mark Siskind,et al.  A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video , 2015, J. Artif. Intell. Res..

[37]  Ashutosh Saxena,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[38]  Jean Oh,et al.  Grounding spatial relations for outdoor robot navigation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[40]  Michael Beetz,et al.  Grounding the Interaction: Anchoring Situated Discourse in Everyday Human-Robot Interaction , 2012, Int. J. Soc. Robotics.

[41]  Hadas Kress-Gazit,et al.  Translating Structured English to Robot Controllers , 2008, Adv. Robotics.

[42]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, HRI 2010.

[43]  Matthew R. Walter,et al.  Efficient Natural Language Interfaces for Assistive Robots , 2014, IROS 2014.

[44]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[45]  Stefanie Tellex,et al.  A natural language planner interface for mobile manipulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Raymond J. Mooney,et al.  Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language , 2014, J. Artif. Intell. Res..

[47]  Jeffrey Mark Siskind,et al.  Seeing What You're Told: Sentence-Guided Activity Recognition in Video , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.