论文信息 - Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments

Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments

Recent advances in data-driven models for grounded language understanding have enabled robots to interpret increasingly complex instructions. Two fundamental limitations of these methods are that most require a full model of the environment to be known a priori, and they attempt to reason over a world representation that is flat and unnecessarily detailed, which limits scalability. Recent semantic mapping methods address partial observability by exploiting language as a sensor to infer a distribution over topological, metric and semantic properties of the environment. However, maintaining a distribution over highly detailed maps that can support grounding of diverse instructions is computationally expensive and hinders real-time human-robot collaboration. We propose a novel framework that learns to adapt perception according to the task in order to maintain compact distributions over semantic maps. Experiments with a mobile manipulator demonstrate more efficient instruction following in a priori unknown environments.

[1] Matthew R. Walter,et al. Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2] Joyce Yue Chai,et al. Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication , 2017, ACL.

[3] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[4] Wolfram Burgard,et al. An efficient fastSLAM algorithm for generating maps of large-scale cyclic environments from raw laser range measurements , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[5] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[6] Stefanie Tellex,et al. Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[7] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[8] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9] Stefanie Tellex,et al. Toward understanding natural language directions , 2010, HRI 2010.

[10] Peter Stone,et al. Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions , 2018, AAAI.

[11] Frank Dellaert,et al. iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[12] Matthew R. Walter,et al. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[13] Peter Stone,et al. Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[14] Luke S. Zettlemoyer,et al. Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[15] Matthew R. Walter,et al. Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16] Matthew R. Walter,et al. Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[17] Felix Duvallet,et al. Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[18] Thomas M. Howard,et al. Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments , 2018, SIGDIAL Conference.

[19] Barbara Caputo,et al. Multi-modal Semantic Place Classification , 2010, Int. J. Robotics Res..

[20] Matthew R. Walter,et al. Inferring Compact Representations for Efficient Natural Language Understanding of Robot Instructions , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21] Peter Stone,et al. Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[22] Wolfram Burgard,et al. Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[23] Matthew R. Walter,et al. Exactly Sparse Extended Information Filters for Feature-based SLAM , 2007, Int. J. Robotics Res..

[24] Yu Zhang,et al. Temporal Spatial Inverse Semantics for Robots Communicating with Humans , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25] Seth J. Teller,et al. Following and interpreting narrated guided tours , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26] Nicholas Roy,et al. Learning Unknown Groundings for Natural Language Interaction with Mobile Robots , 2017, ISRR.

[27] Mohit Shridhar,et al. Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction , 2018, Robotics: Science and Systems.

[28] Nicholas Roy,et al. Real-Time Human-Robot Communication for Manipulation Tasks in Partially Observed Environments , 2018, ISER.

[29] Dieter Fox,et al. Following directions using statistical machine translation , 2010, HRI 2010.

[30] Edwin Olson,et al. AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[31] Nando de Freitas,et al. Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[32] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[33] Ross A. Knepper,et al. Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[34] Edwin Olson,et al. Fast iterative alignment of pose graphs with poor initial estimates , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[35] Luke Fletcher,et al. A Situationally Aware Voice‐commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments , 2015, J. Field Robotics.

[36] Nicholas Roy,et al. Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..