Miscommunication Detection and Recovery in Situated Human–Robot Dialogue

Even without speech recognition errors, robots may face difficulties interpreting natural-language instructions. We present a method for robustly handling miscommunication between people and robots in task-oriented spoken dialogue. This capability is implemented in TeamTalk, a conversational interface to robots that supports detection and recovery from the situated grounding problems of referential ambiguity and impossible actions. We introduce a representation that detects these problems and a nearest-neighbor learning algorithm that selects recovery strategies for a virtual robot. When the robot encounters a grounding problem, it looks back on its interaction history to consider how it resolved similar situations. The learning method is trained initially on crowdsourced data but is then supplemented by interactions from a longitudinal user study in which six participants performed navigation tasks with the robot. We compare results collected using a general model to user-specific models and find that user-specific models perform best on measures of dialogue efficiency, while the general model yields the highest agreement with human judges. Our overall contribution is a novel approach to detecting and recovering from miscommunication in dialogue by including situated context, namely, information from a robot’s path planner and surroundings.

[1]  Alexander I. Rudnicky,et al.  Error awareness and recovery in conversational spoken language interfaces , 2007 .

[2]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[3]  Stefano Carpin,et al.  USARSim: a robot simulator for research and education , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[4]  Matthias Scheutz,et al.  Toward Humanlike Task-Based Dialogue Processing for Human Robot Interaction , 2011, AI Mag..

[5]  Constantine D. Spyropoulos,et al.  HUMAN-ROBOT INTERACTION BASED ON SPOKEN NATURAL LANGUAGE DIALOGUE , 2001 .

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Gabriel Skantze,et al.  Automatic Detection of Miscommunication in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[8]  Stanislao Lauria,et al.  A corpus-based analysis of route instructions in human-robot interaction , 2009 .

[9]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[10]  Maxine Eskénazi,et al.  Non-Native Users in the Let’s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch , 2004, NAACL.

[11]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[12]  Eric Fosler-Lussier,et al.  SCARE: a Situated Corpus with Annotated Referring Expressions , 2008, LREC.

[13]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[15]  Dieter Fox,et al.  Following directions using statistical machine translation , 2010, HRI 2010.

[16]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[17]  Julia Hirschberg,et al.  Predicting Automatic Speech Recognition Performance Using Prosodic Cues , 2000, ANLP.

[18]  Deb Roy,et al.  Interpretation of Spatial Language in a Map Navigation Task , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Gabriel Skantze,et al.  Exploring human error recovery strategies: Implications for spoken dialogue systems , 2005, Speech Communication.

[20]  Holger Knublauch,et al.  The Protégé OWL Plugin: An Open Development Environment for Semantic Web Applications , 2004, SEMWEB.

[21]  Stephen J. Cox,et al.  Confidence measures for the SWITCHBOARD database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[23]  Alexander I. Rudnicky,et al.  TeamTalk: A Platform for Multi-Human-Robot Dialog Research in Coherent Real and Virtual Spaces , 2007, AAAI.

[24]  Matthew Marge,et al.  Comparing Heads-up, Hands-free Operation of Ground Robots to Teleoperation , 2011, Robotics: Science and Systems.

[25]  Stefano Carpin,et al.  Robots, games, and research: success stories in USARSim , 2009 .

[26]  Nina Dethlefs,et al.  Route instructions in map-based human-human and human-computer dialogue: A comparative analysis , 2010, J. Vis. Lang. Comput..

[27]  Marjorie Skubic,et al.  Spatial language for human-robot dialogs , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Manny Rayner,et al.  Adding intelligent help to mixed-initiative spoken dialogue systems , 2002, INTERSPEECH.

[29]  Matthias Scheutz,et al.  Dempster-Shafer theoretic resolution of referential ambiguity , 2018, Auton. Robots.

[30]  Hadas Kress-Gazit,et al.  Sorry Dave, I'm Afraid I Can't Do That: Explaining Unachievable Robot Tasks Using Natural Language , 2013, Robotics: Science and Systems.

[31]  Eric Horvitz,et al.  Directions robot: in-the-wild experiences and lessons learned , 2014, AAMAS.

[32]  Lin Lawrance Chase Error-responsive feedback mechanisms for speech recognizers , 1997 .

[33]  Deb Roy,et al.  Grounded Situation Models for Robots: Where words and percepts meet , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Eric Horvitz,et al.  On the Challenges and Opportunities of Physically Situated Dialog , 2010, AAAI Fall Symposium: Dialog with Robots.

[35]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[36]  Alexander I. Rudnicky,et al.  Integrating Multiple Knowledge Sources for Utterance-Level Confidence Annotation in the CMU Communicator Spoken Dialog System , 2002 .

[37]  Henrik I. Christensen,et al.  Situated Dialogue and Spatial Organization: What, Where… and Why? , 2007 .

[38]  Terrence Fong,et al.  Collaboration, Dialogue, and Human-Robot Interaction , 2001 .

[39]  Alexander I. Rudnicky,et al.  Miscommunication Recovery in Physically Situated Dialogue , 2015, SIGDIAL Conference.

[40]  Christine Doran,et al.  Exploring Speech-Enabled Dialogue with the Galaxy Communicator Infrastructure , 2001, HLT.

[41]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[42]  Anette Frank,et al.  Creating an Annotated Corpus for Generating Walking Directions , 2009 .

[43]  Oliver Lemon,et al.  The MuMMER Project: Engaging Human-Robot Interaction in Real-World Public Spaces , 2016, ICSR.

[44]  Brenna Argall,et al.  Real-time natural language corrections for assistive robotic manipulators , 2017, Int. J. Robotics Res..

[45]  Tim Paek,et al.  Toward a Taxonomy of Communication Errors , 2003 .

[46]  Oliver Lemon,et al.  Combining Chat and Task-Based Multimodal Dialogue for More Engaging HRI: A Scalable Method Using Reinforcement Learning , 2017, HRI.

[47]  Alexander I. Rudnicky,et al.  The Structure and Generality of Spoken Route Instructions , 2012, SIGDIAL Conference.

[48]  Bernd Krieg-Brückner,et al.  Modelling Route Instructions for Robust Human-Robot Interaction on Navigation Tasks , 2008, Int. J. Softw. Informatics.

[49]  Guido Bugmann,et al.  Converting natural language route instructions into robot executable procedures , 2002, Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication.

[50]  Ben J. A. Kröse,et al.  Jijo-2: An Office Robot that Communicates and Learns , 2001, IEEE Intell. Syst..

[51]  Mosur Ravishankar,et al.  New features for confidence annotation , 1998, ICSLP.

[52]  Changsong Liu,et al.  Collaborative Language Grounding Toward Situated Human-Robot Dialogue , 2017, AI Mag..

[53]  Amy Isard,et al.  Evaluating Description and Reference Strategies in a Cooperative Human-Robot Dialogue System , 2009, IJCAI.

[54]  Raymond J. Mooney,et al.  Integrated Learning of Dialog Strategies and Semantic Parsing , 2017, EACL.

[55]  Moritz Tenorth,et al.  KnowRob: A knowledge processing infrastructure for cognition-enabled robots , 2013, Int. J. Robotics Res..

[56]  Marilyn A. Walker,et al.  Using Natural Language Processing and discourse Features to Identify Understanding Errors , 2000, ICML.

[57]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[58]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[59]  Jean Oh,et al.  Learning Qualitative Spatial Relations for Robotic Navigation , 2016, IJCAI.

[60]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..

[61]  Alexander I. Rudnicky,et al.  Olympus: an open-source framework for conversational spoken language interface research , 2007, HLT-NAACL 2007.

[62]  Gabriel Skantze,et al.  Making Grounding Decisions: Data-driven Estimation of Dialogue Costs and Confidence Thresholds , 2007, SIGDIAL.

[63]  Julian Hough,et al.  Investigating Fluidity for Human-Robot Interaction with Real-time, Real-world Grounding Strategies , 2016, SIGDIAL Conference.

[64]  Ross A. Knepper,et al.  Recovering from failure by asking for help , 2015, Auton. Robots.

[65]  Alexander I. Rudnicky,et al.  Instruction Taking in the TeamTalk System , 2010, AAAI Fall Symposium: Dialog with Robots.

[66]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[67]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[68]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[69]  Guido Bugmann,et al.  Corpus-Based Robotics: A Route Instruction Example , 2003 .

[70]  Stefanie Tellex,et al.  Clarifying commands with information-theoretic human-robot dialog , 2013, HRI 2013.

[71]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[72]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Terrence Fong,et al.  Collaboration, Dialogue, Human-Robot Interaction , 2001, ISRR.

[74]  Manuela M. Veloso,et al.  Handling Complex Commands as Service Robot Task Requests , 2015, IJCAI.

[75]  Pierre Lison,et al.  Situated Dialogue Processing for Human-Robot Interaction , 2010, Cognitive Systems.

[76]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[77]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[78]  Lou Boves,et al.  Incorporating confidence measures in the Dutch train timetable information system developed in the ARISE project , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[79]  Alexander I. Rudnicky,et al.  The RavenClaw dialog management framework: Architecture and systems , 2009, Comput. Speech Lang..

[80]  Maja J. Mataric,et al.  Using semantic fields to model dynamic spatial relations in a robot architecture for natural language instruction of service robots , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[81]  Stanley Peters,et al.  A multi-modal dialogue system for human-robot conversation , 2001, HTL 2001.

[82]  Michael Beetz,et al.  Grounding the Interaction: Anchoring Situated Discourse in Everyday Human-Robot Interaction , 2012, Int. J. Soc. Robotics.

[83]  Jeremy H. Wright,et al.  Using Natural Language Processing and Discourse Features to Identify Understanding Errors in a Spoken Dialogue System , 2000 .

[84]  S. Balakirsky,et al.  Mobility open architecture simulation and tools environment , 2005, International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2005..

[85]  Teresa Zollo A Study of Human Dialogue Strategies the Presence of Speech Recognition Errors , 1999 .

[86]  Roberto Basili,et al.  Structured learning for spoken language understanding in human-robot interaction , 2017, Int. J. Robotics Res..

[87]  Alexander I. Rudnicky,et al.  Towards Overcoming Miscommunication in Situated Dialogue by Asking Questions , 2011, AAAI Fall Symposium: Building Representations of Common Ground with Intelligent Agents.

[88]  Matthias Scheutz,et al.  Resolution of Referential Ambiguity in Human-Robot Dialogue Using Dempster-Shafer Theoretic Pragmatics , 2017, Robotics: Science and Systems.

[89]  David Schlangen,et al.  Causes and Strategies for Requesting Clarification in Dialogue , 2004, SIGDIAL Workshop.

[90]  Gabriel Skantze Error Handling in Spoken Dialogue Systems : Managing Uncertainty, Grounding and Miscommunication , 2007 .

[91]  Johanna D. Moore,et al.  Generation and evaluation of user tailored responses in multimodal dialogue , 2004, Cogn. Sci..

[92]  Gabriel Skantze,et al.  A Data-driven Approach to Understanding Spoken Route Directions in Human-Robot Dialogue , 2012, INTERSPEECH.

[93]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[94]  Maxine Eskénazi,et al.  Optimizing Endpointing Thresholds using Dialogue Features in a Spoken Dialogue System , 2008, SIGDIAL Workshop.

[95]  Gabriel Skantze,et al.  Data-driven models for timing feedback responses in a Map Task dialogue system , 2014, Comput. Speech Lang..

[96]  Roberto Basili,et al.  Dialogue with Robots to Support Symbiotic Autonomy , 2016, IWSDS.

[97]  Alexander I. Rudnicky,et al.  Exploring Spoken Dialog Interaction in Human-Robot Teams , 2009 .

[98]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[99]  Emiel Krahmer,et al.  Error Detection in Spoken Human-Machine Interaction , 2001, Int. J. Speech Technol..

[100]  Nikolaos Mavridis,et al.  A review of verbal and non-verbal human-robot interactive communication , 2014, Robotics Auton. Syst..

[101]  Alexander I. Rudnicky,et al.  Comparing Spoken Language Route Instructions for Robots across Environment Representations , 2010, SIGDIAL Conference.

[102]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[103]  Scott Thomas,et al.  Using vision, acoustics, and natural language for disambiguation , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[104]  Wayne H. Ward,et al.  Confidence measures for dialogue management in the CU Communicator system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[105]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[106]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[107]  Heidi Christensen,et al.  Knowledge transfer between speakers for personalised dialogue management , 2015, SIGDIAL Conference.

[108]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  Michael Beetz,et al.  ORO, a knowledge management platform for cognitive architectures in robotics , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[110]  Thora Tenbrink,et al.  Telling Rolland Where to Go: HRI Dialogues on Route Navigation , 2009, Spatial Language and Dialogue.

[111]  Jean Oh,et al.  Inferring Maps and Behaviors from Natural Language Instructions , 2015, ISER.

[112]  Silvia Rossi,et al.  A dialogue system for multimodal human-robot interaction , 2013, ICMI '13.

[113]  Alexander I. Rudnicky,et al.  The TeamTalk Corpus: Route Instructions in Open Spaces , 2011 .

[114]  E. Hall,et al.  The Hidden Dimension , 1970 .

[115]  Michael F. Schober,et al.  Spatial Dialogue between Partners with Mismatched Abilities , 2009, Spatial Language and Dialogue.

[116]  Matthew R. Walter,et al.  Learning models for following natural language directions in unknown environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[117]  J. Gregory Trafton,et al.  Finding the FOO: a pilot study for a multimodal interface , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[118]  Smaranda Muresan,et al.  Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.