Grounding spatial relations for human-robot interaction

We propose a system for human-robot interaction that learns both models for spatial prepositions and for object recognition. Our system grounds the meaning of an input sentence in terms of visual percepts coming from the robot's sensors in order to send an appropriate command to the PR2 or respond to spatial queries. To perform this grounding, the system recognizes the objects in the scene, determines which spatial relations hold between those objects, and semantically parses the input sentence. The proposed system uses the visual and spatial information in conjunction with the semantic parse to interpret statements that refer to objects (nouns), their spatial relationships (prepositions), and to execute commands (actions). The semantic parse is inherently compositional, allowing the robot to understand complex commands that refer to multiple objects and relations such as: “Move the cup close to the robot to the area in front of the plate and behind the tea box”. Our system correctly parses 94% of the 210 online test sentences, correctly interprets 91% of the correctly parsed sentences, and correctly executes 89% of the correctly interpreted sentences.

[1]  Luc Steels,et al.  Grounding adaptive language games in robotic agents , 1997 .

[2]  Laurent Wendling,et al.  A New Way to Represent the Relative Position between Areal Objects , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[4]  Benjamin Kuipers,et al.  The Spatial Semantic Hierarchy , 2000, Artif. Intell..

[5]  Laura A. Carlson,et al.  Grounding spatial language in perception: an empirical and computational investigation. , 2001, Journal of experimental psychology. General.

[6]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[7]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[8]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Reinhard Moratz,et al.  Spatial Reference in Linguistic Human-Robot Interaction: Iterative, Empirically Supported Development of a Model of Projective Relations , 2006, Spatial Cogn. Comput..

[10]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[11]  John D. Kelleher,et al.  Proximity in Context: An Empirically Grounded Computational Model of Proximity for Processing Topological Spatial Expressions , 2006, ACL.

[12]  Matthew A. Brown,et al.  Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[14]  Ivana Kruijff-Korbayová,et al.  A Portfolio Approach to Algorithm Selection , 2009, IJCAI.

[15]  Matei T. Ciocarlie,et al.  Contact-reactive grasping of objects with partial shape information , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Takayuki Nagai,et al.  Robots that Learn to Communicate: A Developmental Approach to Personally and Physically Situated Human-Robot Conversations , 2010, AAAI Fall Symposium: Dialog with Robots.

[17]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[18]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, HRI 2010.

[19]  David P. Pancho,et al.  Using Soft Constraints to Interpret Descriptions of Shapes , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[20]  Takashi Nose,et al.  Grounding New Words on the Physical World in Multi-Domain Human-Robot Dialogues , 2010, AAAI Fall Symposium: Dialog with Robots.

[21]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[22]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[23]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[24]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[25]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.