Learning spatial-semantic representations from natural language descriptions and scene classifications

We describe a semantic mapping algorithm that learns human-centric environment models by interpreting natural language utterances. Underlying the approach is a coupled metric, topological, and semantic representation of the environment that enables the method to fuse information from natural language descriptions with low-level metric and appearance data. We extend earlier work with a novel formulation that incorporates spatial layout into a topological representation of the environment. We also describe a factor graph formulation of the semantic properties that encodes human-centric concepts such as type and colloquial name for each mapped region. The algorithm infers these properties by combining the user's natural language descriptions with image- and laser-based scene classification. We also propose a mechanism to more effectively ground natural language descriptions of distant regions using semantic cues from other modalities. We describe how the algorithm employs this learned semantic information to propose valid topological hypotheses, leading to more accurate topological and metric maps. We demonstrate that integrating language with other sensor data increases the accuracy of the achieved spatial-semantic representation of the environment.

[1]  Benjamin Kuipers,et al.  The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[2]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .

[3]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Wolfram Burgard,et al.  Supervised Learning of Places from Range Data using AdaBoost , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[6]  Javier González,et al.  Consistent observation grouping for generating metric-topological maps that improves robot localization , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[7]  Rainer Stiefelhagen,et al.  Visual recognition of pointing gestures for human-robot interaction , 2007, Image Vis. Comput..

[8]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[9]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[10]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[11]  Barbara Caputo,et al.  Multi-modal Semantic Place Classification , 2010, Int. J. Robotics Res..

[12]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[13]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, HRI 2010.

[14]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[15]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[16]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[17]  Seth J. Teller,et al.  Following and interpreting narrated guided tours , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[20]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[21]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.