Language Understanding for Field and Service Robots in a Priori Unknown Environments

Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile-manipulation commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot’s environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a “sensor”—inferring spatial, topological, and semantic information implicit in natural-language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot’s action space, consistent with the utterance. We use imitation learning to identify a beliefspace policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile-manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating that the algorithm can follow natural-language instructions without prior knowledge of the environment.

[1]  Patrick Beeson,et al.  TRAC-IK: An open-source library for improved solving of generic inverse kinematics , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[2]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[3]  Stefanie Tellex,et al.  Interpreting and Executing Recipes with a Cooking Robot , 2012, ISER.

[4]  Nicholas Roy,et al.  Real-Time Human-Robot Communication for Manipulation Tasks in Partially Observed Environments , 2018, ISER.

[5]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[6]  Patric Jensfelt,et al.  Active Visual Object Search in Unknown Environments Using Uncertain Semantics , 2013, IEEE Transactions on Robotics.

[7]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[8]  Nicholas Roy,et al.  Utilizing object-object and object-scene context when planning to find things , 2009, 2009 IEEE International Conference on Robotics and Automation.

[9]  Wolfram Burgard,et al.  Supervised semantic labeling of places using information extracted from sensor data , 2007, Robotics Auton. Syst..

[10]  Peter I. Corke,et al.  Automation of an underground mining vehicle using reactive navigation and opportunistic localization , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[11]  Till Mossakowski,et al.  Specification of an Ontology for Route Graphs , 2004, Spatial Cognition.

[12]  Peter I. Corke,et al.  Autonomous control of underground mining vehicles using reactive navigation , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[13]  Y. Hisaki Deep Sea Research Part I: Oceanographic Research Papers , 2019 .

[14]  Verena Rieser,et al.  A review of spatial reasoning and interaction for real-world robotics , 2017, Adv. Robotics.

[15]  Nicholas Roy,et al.  Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..

[16]  Daniel D. Lee,et al.  Little Ben: The Ben Franklin Racing Team's entry in the 2007 DARPA Urban Challenge , 2008, J. Field Robotics.

[17]  ThrunSebastian,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006 .

[18]  Kazuya Yoshida,et al.  Emergency response to the nuclear accident at the Fukushima Daiichi Nuclear Power Plants using mobile rescue robots , 2013, J. Field Robotics.

[19]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[20]  Luke Fletcher,et al.  A Situationally Aware Voice‐commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments , 2015, J. Field Robotics.

[21]  Stefan B. Williams,et al.  Generation and visualization of large‐scale three‐dimensional reconstructions from underwater robotic surveys , 2010, J. Field Robotics.

[22]  R. Castro,et al.  Tracking Hydrocarbon Plume Transport and Biodegradation at Deepwater Horizon , 2010 .

[23]  Hugh F. Durrant-Whyte,et al.  An experiment in autonomous navigation of an underground mining vehicle , 1999, IEEE Trans. Robotics Autom..

[24]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.

[25]  Brian Yamauchi,et al.  PackBot: a versatile platform for military robotics , 2004, SPIE Defense + Commercial Sensing.

[26]  Seth J. Teller,et al.  Following and interpreting narrated guided tours , 2011, 2011 IEEE International Conference on Robotics and Automation.

[27]  Frank Dellaert,et al.  iSAM: Incremental Smoothing and Mapping , 2008, IEEE Transactions on Robotics.

[28]  Matthew R. Walter,et al.  A framework for learning semantic maps from grounded natural language descriptions , 2014, Int. J. Robotics Res..

[29]  Hugh F. Durrant-Whyte,et al.  Field and service applications - An autonomous straddle carrier for movement of shipping containers - From Research to Operational Autonomous Systems , 2007, IEEE Robotics & Automation Magazine.

[30]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[31]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  John Folkesson,et al.  Search in the real world: Active visual object search based on spatial relations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33]  Matthew R. Walter,et al.  Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments , 2019, CoRL.

[34]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[35]  Stefan B. Williams,et al.  Monitoring of Benthic Reference Sites: Using an Autonomous Underwater Vehicle , 2012, IEEE Robotics & Automation Magazine.

[36]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[37]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[38]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[39]  Joachim Hertzberg,et al.  Semantic Scene Analysis of Scanned 3D Indoor Environments , 2003, VMV.

[40]  C. Roman,et al.  Seabed AUV offers new platform for high‐resolution imaging , 2004 .

[41]  Mohit Shridhar,et al.  Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction , 2018, Robotics: Science and Systems.

[42]  Kazuya Yoshida,et al.  Development of a Networked Robotic System for Disaster Mitigation , 2007, FSR.

[43]  Larry H. Matthies,et al.  Two years of Visual Odometry on the Mars Exploration Rovers , 2007, J. Field Robotics.

[44]  Sungchul Kang,et al.  Multi-modal user interface for teleoperation of ROBHAZ-DT2 field robot system , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[45]  Peter King,et al.  Odin: Team VictorTango's entry in the DARPA Urban Challenge , 2008, J. Field Robotics.

[46]  Emilio Frazzoli,et al.  Anytime Motion Planning using the RRT* , 2011, 2011 IEEE International Conference on Robotics and Automation.

[47]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[48]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[49]  Brian Bingham,et al.  Techniques for Deep Sea Near Bottom Survey Using an Autonomous Underwater Vehicle , 2007, Int. J. Robotics Res..

[50]  Hugh F. Durrant-Whyte,et al.  An Autonomous Guided Vehicle for Cargo Handling Applications , 1995, ISER.

[51]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[52]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[53]  I. Lee Hetherington,et al.  PocketSUMMIT: small-footprint continuous speech recognition , 2007, INTERSPEECH.

[54]  Matthew R. Walter,et al.  Exactly Sparse Extended Information Filters for Feature-based SLAM , 2007, Int. J. Robotics Res..

[55]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56]  Ephrahim Garcia,et al.  Team Cornell's Skynet: Robust perception and planning in an urban environment , 2008, J. Field Robotics.

[57]  Brenna Argall,et al.  Real-time natural language corrections for assistive robotic manipulators , 2017, Int. J. Robotics Res..

[58]  Joseph Redmon,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[59]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[60]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[62]  Jean Oh,et al.  Inferring Maps and Behaviors from Natural Language Instructions , 2015, ISER.

[63]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[64]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[65]  Peter Stone,et al.  Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy" , 2016, IJCAI.

[66]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[67]  Matthew R. Walter,et al.  Efficient Natural Language Interfaces for Assistive Robots , 2014, IROS 2014.

[68]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[69]  Thomas M. Howard,et al.  Language-Guided Adaptive Perception for Efficient Grounded Communication with Robotic Manipulators in Cluttered Environments , 2018, SIGDIAL Conference.

[70]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[71]  Johan Larsson,et al.  Autonomous underground tramming for center‐articulated vehicles , 2008, J. Field Robotics.

[72]  Stefanie Tellex,et al.  A natural language planner interface for mobile manipulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[73]  Paul Timothy Furgale,et al.  Visual teach and repeat for long‐range rover autonomy , 2010, J. Field Robotics.

[74]  Nicholas Roy,et al.  Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators , 2016, Robotics: Science and Systems.

[75]  Edwin Olson,et al.  AprilTag: A robust and flexible visual fiducial system , 2011, 2011 IEEE International Conference on Robotics and Automation.

[76]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[77]  Tsuhan Chen,et al.  Deep Neural Network for Real-Time Autonomous Indoor Navigation , 2015, ArXiv.

[78]  William Whittaker,et al.  A robust approach to high‐speed navigation for unrehearsed desert terrain , 2006, J. Field Robotics.

[79]  Wolfram Burgard,et al.  Conceptual spatial representations for indoor mobile robots , 2008, Robotics Auton. Syst..

[80]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[81]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[82]  Roland Siegwart,et al.  Bayesian space conceptualization and place classification for semantic maps in mobile robotics , 2008, Robotics Auton. Syst..

[83]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Hanumant Singh,et al.  Exactly Sparse Delayed-State Filters , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[85]  Yu Zhang,et al.  Temporal Spatial Inverse Semantics for Robots Communicating with Humans , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[86]  Deb Roy,et al.  Conversational Robots: Building Blocks for Grounding Word Meaning , 2003, HLT-NAACL 2003.

[87]  Peter I. Corke,et al.  Experiments in autonomous underground guidance , 1997, Proceedings of International Conference on Robotics and Automation.

[88]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[89]  Sungchul Kang,et al.  ROBHAZ-DT2: design and integration of passive double tracked mobile manipulator system for explosive ordnance disposal , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[90]  Nicholas Roy,et al.  Learning Unknown Groundings for Natural Language Interaction with Mobile Robots , 2017, ISRR.

[91]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[92]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[93]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[94]  Martial Hebert,et al.  Integrated Intelligence for Human-Robot Teams , 2016, ISER.

[95]  Matthew R. Walter,et al.  Navigational Instruction Generation as Inverse Reinforcement Learning with Neural Machine Translation , 2016, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[96]  Simon J. Godsill,et al.  On sequential simulation-based methods for Bayesian filtering , 1998 .

[97]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[98]  Ross A. Knepper,et al.  Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.

[99]  Frank Dellaert,et al.  Online probabilistic topological mapping , 2011, Int. J. Robotics Res..

[100]  Seiga Kiribayashi,et al.  Redesign of rescue mobile robot Quince -Toward emergency response to the nuclear accident at Fukushima Daiichi Nuclear Power Station on March 2011- , 2011 .

[101]  Peter Stone,et al.  Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions , 2018, AAAI.

[102]  Stefan B. Williams,et al.  Error modeling and calibration of exteroceptive sensors for accurate mapping applications , 2010 .

[103]  Joyce Yue Chai,et al.  Interactive Learning of Grounded Verb Semantics towards Human-Robot Communication , 2017, ACL.

[104]  Mark A. Paskin,et al.  Thin Junction Tree Filters for Simultaneous Localization and Mapping , 2002, IJCAI.

[105]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[106]  R. McCabe,et al.  The Nereus hybrid underwater robotic vehicle for global ocean science operations to 11,000m depth , 2008, OCEANS 2008.

[107]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[108]  Jeffrey R. Johnson,et al.  Spirit Mars Rover Mission: Overview and selected results from the northern Home Plate Winter Haven to the side of Scamander crater , 2010 .

[109]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[110]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[111]  William Whittaker,et al.  A robust approach to high‐speed navigation for unrehearsed desert terrain , 2007 .

[112]  Cipriano Galindo,et al.  Multi-hierarchical semantic maps for mobile robotics , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[113]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[114]  C. Langmuir,et al.  Hydrothermal exploration with the Autonomous Benthic Explorer , 2008 .

[115]  Subhro Roy,et al.  Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions , 2020 .

[116]  Eörs Szathmáry,et al.  A robust approach , 2006, Nature.

[117]  Matthew R. Walter,et al.  On the performance of hierarchical distributed correspondence graphs for efficient symbol grounding of robot instructions , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[118]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[119]  Matthew R. Walter,et al.  Inferring Compact Representations for Efficient Natural Language Understanding of Robot Instructions , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[120]  Barbara Caputo,et al.  Multi-modal Semantic Place Classification , 2010, Int. J. Robotics Res..

[121]  Matthew R. Walter,et al.  Learning spatial-semantic representations from natural language descriptions and scene classifications , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[122]  Gordon Cheng,et al.  Attention-based active visual search for mobile robots , 2018, Autonomous Robots.

[123]  Javier González,et al.  Consistent observation grouping for generating metric-topological maps that improves robot localization , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[124]  Matthew R. Walter,et al.  A multimodal interface for real-time soldier-robot teaming , 2016, SPIE Defense + Security.

[125]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[126]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..