Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue

In human-robot dialogue, although a robot and its human partner are co-present in a shared environment, they have significantly mismatched perceptual capabilities (e.g., recognizing objects in the surroundings). When a shared perceptual basis is missing, it becomes difficult for the robot to identify referents in the physical world that are referred to by the human (i.e., a problem of referential grounding). To overcome this problem, we have developed an optimization based approach that allows the robot to detect and adapt to perceptual differences. Through online interaction with the human, the robot can learn a set of weights indicating how reliably/unreliably each dimension (e.g., object type, object color, etc.) of its perception of the environment maps to the human’s linguistic descriptors and thus adjust its word models accordingly. Our empirical evaluation has shown that this weight-learning approach can successfully adjust the weights to reflect the robot’s perceptual limitations. The learned weights, together with updated word models, can lead to a significant improvement for referential grounding in future dialogues.

[1]  Joyce Yue Chai,et al.  Integrating word acquisition and referential grounding towards physical world interaction , 2012, ICMI '12.

[2]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[3]  Changsong Liu,et al.  Collaborative Effort towards Common Ground in Situated Human-Robot Dialogue , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[4]  Giulio Sandini,et al.  Cognitive Systems , 2005 .

[5]  C. Roos,et al.  Interior Point Methods for Linear Optimization , 2005 .

[6]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[7]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[8]  Raymond J. Mooney,et al.  Learning to Connect Language and Perception , 2008, AAAI.

[9]  Deb Roy,et al.  Grounded Semantic Composition for Visual Scenes , 2011, J. Artif. Intell. Res..

[10]  Michelle X. Zhou,et al.  Optimization in Multimodal Interpretation , 2004, ACL.

[11]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[12]  Marilyn A. Walker,et al.  MATCH: An Architecture for Multimodal Dialogue Systems , 2002, ACL.

[13]  David DeVault,et al.  Learning to Interpret Utterances Using Dialogue History , 2009, EACL.

[14]  Changsong Liu,et al.  Towards Mediating Shared Perceptual Basis in Situated Dialogue , 2012, SIGDIAL Conference.

[15]  Michael Beetz,et al.  Grounding the Interaction: Anchoring Situated Discourse in Everyday Human-Robot Interaction , 2012, Int. J. Soc. Robotics.

[16]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[17]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[18]  Joyce Yue Chai,et al.  Collaborative Models for Referring Expression Generation in Situated Dialogue , 2014, AAAI.

[19]  Joyce Yue Chai,et al.  Context-based Word Acquisition for Situated Dialogue in a Virtual World , 2014, J. Artif. Intell. Res..

[20]  Matthias Scheutz,et al.  Incremental natural language processing for HRI , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Deb Roy,et al.  Situated Language Understanding as Filtering Perceived Affordances , 2007, Cogn. Sci..

[22]  Changsong Liu,et al.  Probabilistic Labeling for Efficient Referential Grounding based on Collaborative Discourse , 2014, ACL.

[23]  King-Sun Fu,et al.  Error-Correcting Isomorphisms of Attributed Relational Graphs for Pattern Analysis , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  D. Roy Grounding words in perception and action: computational insights , 2005, Trends in Cognitive Sciences.

[25]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[26]  Roberto Marcondes Cesar Junior,et al.  Inexact graph matching for model-based recognition: Evaluation and comparison of optimization algorithms , 2005, Pattern Recognit..

[27]  Changsong Liu,et al.  Towards Situated Dialogue: Revisiting Referring Expression Generation , 2013, EMNLP.

[28]  Changsong Liu,et al.  Modeling Collaborative Referring for Situated Referential Grounding , 2013, SIGDIAL Conference.

[29]  David Schlangen,et al.  A Simple Method for Resolution of Definite Reference in a Shared Visual Context , 2008, SIGDIAL Workshop.

[30]  Laura A. Carlson,et al.  Grounding spatial language in perception: an empirical and computational investigation. , 2001, Journal of experimental psychology. General.

[31]  Nick Hawes,et al.  Incremental , multi-level processing for comprehending situated dialogue in human-robot interaction , 2007 .