Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation

This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semi-structured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs (G3), dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command's hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as "Put the tire pallet on the truck." The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot's performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system's performance. We demonstrate that our system can successfully follow many natural language commands from the corpus.

[1]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[2]  Boris Katz,et al.  Using English for Indexing and Retrieving , 1991 .

[3]  B. Landau,et al.  “What” and “where” in spatial language and spatial cognition , 1993 .

[4]  Laura A. Carlson,et al.  Grounding spatial language in perception: an empirical and computational investigation. , 2001, Journal of experimental psychology. General.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[8]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[9]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[10]  Stefanie Tellex,et al.  Object schemas for grounding language in a responsive robot , 2008, Connect. Sci..

[11]  A. Haas,et al.  Learning to Follow Navigational Route Instructions , 2009, IJCAI.

[12]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Tara N. Sainath,et al.  A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[14]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  Stefanie Tellex,et al.  Natural language and spatial reasoning , 2010 .

[17]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[18]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[19]  P. Parley What to Do, and How to Do It , 2011 .

[20]  A. Wierzbicka,et al.  Semantics and cognition. , 2006, Wiley interdisciplinary reviews. Cognitive science.

[21]  Péter Szigetvári,et al.  What and When? , 2019, Inauguration and Liturgical Kingship in the Long Twelfth Century.

[22]  Beate Hamp,et al.  The Fundamental System of Spatial Schemas in Language , 2022 .