Toward understanding natural language directions

Speaking using unconstrained natural language is an intuitive and flexible way for humans to interact with robots. Understanding this kind of linguistic input is challenging because diverse words and phrases must be mapped into structures that the robot can understand, and elements in those structures must be grounded in an uncertain environment. We present a system that follows natural language directions by extracting a sequence of spatial description clauses from the linguistic input and then infers the most probable path through the environment given only information about the environmental geometry and detected visible objects. We use a probabilistic graphical model that factors into three key components. The first component grounds landmark phrases such as “the computers” in the perceptual frame of the robot by exploiting co-occurrence statistics from a database of tagged images such as Flickr. Second, a spatial reasoning component judges how well spatial relations such as “past the computers” describe a path. Finally, verb phrases such as “turn right” are modeled according to the amount of change in orientation in the path. Our system follows 60% of the directions in our corpus to within 15 meters of the true destination, significantly outperforming other approaches.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[3]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[4]  Wolfram Burgard,et al.  Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters , 2007, IEEE Transactions on Robotics.

[5]  John D. Kelleher,et al.  Applying Computational Models of Spatial Prepositions to Visually Situated Dialog , 2009, CL.

[6]  Stefanie Tellex,et al.  Grounding spatial prepositions for video search , 2009, ICMI-MLMI '09.

[7]  Nicholas Roy,et al.  Topological mapping using spectral clustering and classification , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .

[9]  B. Landau,et al.  “What” and “where” in spatial language and spatial cognition , 1993 .

[10]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[11]  Nicholas Roy,et al.  Where to go: Interpreting natural directions using global inference , 2009, 2009 IEEE International Conference on Robotics and Automation.

[12]  Robert Laddaga,et al.  A location representation for generating descriptive walking directions , 2005, IUI.

[13]  Deb Roy,et al.  Interpretation of Spatial Language in a Map Navigation Task , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Nicholas Roy,et al.  Utilizing object-object and object-scene context when planning to find things , 2009, 2009 IEEE International Conference on Robotics and Automation.

[16]  Tingting Xu,et al.  The Autonomous City Explorer: Towards Natural Human-Robot Interaction in Urban Environments , 2009, Int. J. Soc. Robotics.

[17]  Terrance Philip Regier,et al.  The acquisition of lexical semantics for spatial terms: a connectionist model of perceptual categorization , 1992 .

[18]  Beate Hamp,et al.  The Fundamental System of Spatial Schemas in Language , 2022 .