Improving Object Disambiguation from Natural Language using Empirical Models

Robots, virtual assistants, and other intelligent agents need to effectively interpret verbal references to environmental objects in order to successfully interact and collaborate with humans in complex tasks. However, object disambiguation can be a challenging task due to ambiguities in natural language. To reduce uncertainty when describing an object, humans often use a combination of unique object features and locative prepositions --prepositional phrases that describe where an object is located relative to other features (i.e., reference objects) in a scene. We present a new system for object disambiguation in cluttered environments based on probabilistic models of unique object features and spatial relationships. Our work extends prior models of spatial relationship semantics by collecting and encoding empirical data from a series of crowdsourced studies to better understand how and when people use locative prepositions, how reference objects are chosen, and how to model prepositional geometry in 3D space (e.g., capturing distinctions between "next to" and "beside"). Our approach also introduces new techniques for responding to compound locative phrases of arbitrary complexity and proposes a new metric for disambiguation confidence. An experimental validation revealed our method can improve object disambiguation accuracy and performance over past approaches.

[1]  Kerstin Fischer,et al.  Cognitive Modeling of Spatial Reference for Human-Robot Interaction , 2001, Int. J. Artif. Intell. Tools.

[2]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[3]  Albert S. Huang,et al.  Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Language , 2013 .

[4]  J. Gregory Trafton,et al.  Enabling effective human-robot interaction using perspective-taking in robots , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[5]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Gerhard Sagerer,et al.  Projective Relations for 3D Space: Computational Model, Application, and Psychological Evaluation , 1997, AAAI/IAAI.

[7]  J. Gregory Trafton,et al.  The Role of Spatial Information in Referential Communication: Speaker and Addressee Preferences for Disambiguating Objects , 2007 .

[8]  Rachid Alami,et al.  Which one? Grounding the referent based on efficient human-robot interaction , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[9]  Klaus-Peter Gapp Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space , 1994, AAAI.

[10]  Laurent Wendling,et al.  A New Way to Represent the Relative Position between Areal Objects , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Josef van Genabith,et al.  A Computational Model of the Referential Semantics of Projective Prepositions , 2006 .

[12]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[13]  Annette Herskovits,et al.  Language and spatial cognition , 1986 .

[14]  Allison Sauppé,et al.  Robot Deictics: How Gesture and Context Shape Referential Communication , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Reinhard Moratz,et al.  Spatial Reference in Linguistic Human-Robot Interaction: Iterative, Empirically Supported Development of a Model of Projective Relations , 2006, Spatial Cogn. Comput..

[16]  Maja J. Mataric,et al.  Using Spatial Semantic and Pragmatic Fields to Interpret Natural Language Pick-and-Place Instructions for a Mobile Service Robot , 2013, ICSR.

[17]  Annette Herskovits,et al.  Semantics and Pragmatics of Locative Expressions , 1985, Cogn. Sci..

[18]  John D. Kelleher,et al.  Applying Computational Models of Spatial Prepositions to Visually Situated Dialog , 2009, CL.

[19]  Michael John Barclay Reference object choice in spatial language : machine and human models , 2010 .

[20]  S. Levinson Frames of reference and Molyneux's question: Cross-linguistic evidence , 1996 .

[21]  A. Meyer,et al.  Tracking the time course of multidimensional stimulus discrimination: Analyses of viewing patterns and processing times during “same”-“different“ decisions , 2002 .

[22]  Albert S. Huang,et al.  Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands , 2017, ArXiv.

[23]  David Whitney,et al.  Reducing errors in object-fetching interactions through social feedback , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Bilge Mutlu,et al.  Using gaze patterns to predict task intent in collaboration , 2015, Front. Psychol..