Inferring Compact Representations for Efficient Natural Language Understanding of Robot Instructions

The speed and accuracy with which robots are able to interpret natural language is fundamental to realizing effective human-robot interaction. A great deal of attention has been paid to developing models and approximate inference algorithms that improve the efficiency of language understanding. However, existing methods still attempt to reason over a representation of the environment that is flat and unnecessarily detailed, which limits scalability. An open problem is then to develop methods capable of producing the most compact environment model sufficient for accurate and efficient natural language understanding. We propose a model that leverages environment-related information encoded within instructions to identify the subset of observations and perceptual classifiers necessary to perceive a succinct, instruction-specific environment representation. The framework uses three probabilistic graphical models trained from a corpus of annotated instructions to infer salient scene semantics, perceptual classifiers, and grounded symbols. Experimental results on two robots operating in different environments demonstrate that by exploiting the content and the structure of the instructions, our method learns compact environment representations that significantly improve the efficiency of natural language symbol grounding.

[1]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[2]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Deb Roy,et al.  Conversational Robots: Building Blocks for Grounding Word Meaning , 2003, HLT-NAACL 2003.

[4]  Nicholas Roy,et al.  Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..

[5]  Joyce Yue Chai,et al.  Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication , 2017, ACL.

[6]  Matthew R. Walter,et al.  Approaching the Symbol Grounding Problem with Probabilistic Graphical Models , 2011, AI Mag..

[7]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[8]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[9]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[10]  Benjamin Kuipers,et al.  Local metrical and global topological maps in the hybrid spatial semantic hierarchy , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Paul Newman,et al.  Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.

[12]  Wolfram Burgard,et al.  Nonlinear Constraint Network Optimization for Efficient Map Learning , 2009, IEEE Transactions on Intelligent Transportation Systems.

[13]  Nicholas Roy,et al.  Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators , 2016, Robotics: Science and Systems.

[14]  Ross A. Knepper,et al.  Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[15]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[16]  Benjamin Kuipers,et al.  Factoring the Mapping Problem: Mobile Robot Map-building in the Hybrid Spatial Semantic Hierarchy , 2010, Int. J. Robotics Res..

[17]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[18]  Matthew R. Walter,et al.  Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Stefanie Tellex,et al.  A natural language planner interface for mobile manipulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Jean Oh,et al.  Inferring Maps and Behaviors from Natural Language Instructions , 2015, ISER.

[21]  Luc Steels,et al.  Co-Acquisition of Syntax and Semantics - An Investigation in Spatial Language , 2015, IJCAI.

[22]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Matthew R. Walter,et al.  On the performance of hierarchical distributed correspondence graphs for efficient symbol grounding of robot instructions , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Benjamin Kuipers,et al.  Using the topological skeleton for scalable global metrical map-building , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[28]  Benjamin Kuipers,et al.  The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[29]  Stefanie Tellex,et al.  Interpreting and Executing Recipes with a Cooking Robot , 2012, ISER.

[30]  Matthew R. Walter,et al.  Exactly Sparse Extended Information Filters for Feature-based SLAM , 2007, Int. J. Robotics Res..

[31]  Thomas M. Howard,et al.  Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments , 2018, SIGDIAL Conference.

[32]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[33]  Roland Siegwart,et al.  Bayesian space conceptualization and place classification for semantic maps in mobile robotics , 2008, Robotics Auton. Syst..

[34]  Wolfram Burgard,et al.  Supervised semantic labeling of places using information extracted from sensor data , 2007, Robotics Auton. Syst..

[35]  Luke Fletcher,et al.  A Situationally Aware Voice‐commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments , 2015, J. Field Robotics.

[36]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[37]  Yu Zhang,et al.  Temporal Spatial Inverse Semantics for Robots Communicating with Humans , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[39]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).