A framework for learning semantic maps from grounded natural language descriptions

This paper describes a framework that enables robots to efficiently learn human-centric models of their environment from natural language descriptions. Typical semantic mapping approaches are limited to augmenting metric maps with higher-level properties of the robot’s surroundings (e.g. place type, object locations) that can be inferred from the robot’s sensor data, but do not use this information to improve the metric map. The novelty of our algorithm lies in fusing high-level knowledge that people can uniquely provide through speech with metric information from the robot’s low-level sensor streams. Our method jointly estimates a hybrid metric, topological, and semantic representation of the environment. This semantic graph provides a common framework in which we integrate information that the user communicates (e.g. labels and spatial relations) with metric observations from low-level sensors. Our algorithm efficiently maintains a factored distribution over semantic graphs based upon the stream of natural language and low-level sensor information. We detail the means by which the framework incorporates knowledge conveyed by the user’s descriptions, including the ability to reason over expressions that reference yet unknown regions in the environment. We evaluate the algorithm’s ability to learn human-centric maps of several different environments and analyze the knowledge inferred from language and the utility of the learned maps. The results demonstrate that the incorporation of information from free-form descriptions increases the metric, topological, and semantic accuracy of the recovered environment model.

[1]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[2]  Matthew R. Walter,et al.  Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Kurt Konolige,et al.  Large-Scale Map-Making , 2004, AAAI.

[4]  Edwin Olson,et al.  Spatially-Adaptive Learning Rates for Online Incremental SLAM , 2007, Robotics: Science and Systems.

[5]  Roland Siegwart,et al.  Bayesian space conceptualization and place classification for semantic maps in mobile robotics , 2008, Robotics Auton. Syst..

[6]  Peter Cheeseman,et al.  On the Representation and Estimation of Spatial Uncertainty , 1986 .

[7]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[8]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[9]  Benjamin Kuipers,et al.  Factoring the Mapping Problem: Mobile Robot Map-building in the Hybrid Spatial Semantic Hierarchy , 2010, Int. J. Robotics Res..

[10]  James J. Little,et al.  Vision-based global localization and mapping for mobile robots , 2005, IEEE Transactions on Robotics.

[11]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[12]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[14]  Benjamin Kuipers,et al.  Local metrical and global topological maps in the hybrid spatial semantic hierarchy , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[15]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Frank Dellaert,et al.  Bayesian surprise and landmark detection , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Hanumant Singh,et al.  Exactly Sparse Delayed-State Filters , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[18]  Seth J. Teller,et al.  Following and interpreting narrated guided tours , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[20]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[21]  Ray Jackendoff Semantics and Cognition , 1983 .

[22]  Wolfram Burgard,et al.  Integrating Topological and Metric Maps for Mobile Robot Navigation: A Statistical Approach , 1998, AAAI/IAAI.

[23]  Benjamin Kuipers,et al.  Using the topological skeleton for scalable global metrical map-building , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[24]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[25]  Frank Dellaert,et al.  Online probabilistic topological mapping , 2011, Int. J. Robotics Res..

[26]  Michael Bosse,et al.  Simultaneous Localization and Map Building in Large-Scale Cyclic Environments Using the Atlas Framework , 2004, Int. J. Robotics Res..

[27]  Benjamin Kuipers,et al.  The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[28]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[29]  P. Fearnhead,et al.  On‐line inference for hidden Markov models via particle filters , 2003 .

[30]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .

[31]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[32]  John J. Leonard,et al.  Consistent, Convergent, and Constant-Time SLAM , 2003, IJCAI.

[33]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[34]  Jun S. Liu,et al.  Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..

[35]  Kevin Lynch,et al.  The Image of the City , 1960 .

[36]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[37]  A. Wierzbicka,et al.  Semantics and cognition. , 2006, Wiley interdisciplinary reviews. Cognitive science.

[38]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[39]  Nicholas Roy,et al.  Topological mapping using spectral clustering and classification , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[41]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[42]  Nicholas Roy,et al.  Utilizing object-object and object-scene context when planning to find things , 2009, 2009 IEEE International Conference on Robotics and Automation.

[43]  Barbara Caputo,et al.  Multi-modal Semantic Place Classification , 2010, Int. J. Robotics Res..

[44]  Wolfram Burgard,et al.  Supervised semantic labeling of places using information extracted from sensor data , 2007, Robotics Auton. Syst..

[45]  Till Mossakowski,et al.  Specification of an Ontology for Route Graphs , 2004, Spatial Cognition.

[46]  Matthew R. Walter,et al.  Exactly Sparse Extended Information Filters for Feature-based SLAM , 2007, Int. J. Robotics Res..

[47]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[48]  Cipriano Galindo,et al.  Multi-hierarchical semantic maps for mobile robotics , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[50]  Kurt Konolige,et al.  Incremental mapping of large cyclic environments , 1999, Proceedings 1999 IEEE International Symposium on Computational Intelligence in Robotics and Automation. CIRA'99 (Cat. No.99EX375).

[51]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[52]  Frank Dellaert,et al.  Incremental smoothing and mapping , 2008 .