Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions

Abstract In this paper, we introduce probabilistic model based architecture for error handling in human–robot spoken dialogue systems under adverse audio conditions. In this architecture, a Bayesian network framework is used for interpretation of multi-modal signals in the spoken dialogue between a tour-guide robot and visitors in mass exhibition conditions. In particular, we report on experiments interpreting speech and laser scanner signals in the dialogue management system of the autonomous tour-guide robot RoboX, successfully deployed at the Swiss National Exhibition (Expo.02). A correct interpretation of a user’s (visitor’s) goal or intention at each dialogue state is a key issue for successful voice-enabled communication between tour-guide robots and visitors. To infer the visitors’ goal under the uncertainty intrinsic to these two modalities, we introduce Bayesian networks for combining noisy speech recognition with data from a laser scanner, which are independent of acoustic noise. Experiments with real-world data, collected during the operation of RoboX at Expo.02 demonstrate the effectiveness of the approach in adverse environment. The proposed architecture makes it possible to model error-handling processes in spoken dialogue systems, which include complex combination of different multi-modal information sources in cases where such information is available.

[1]  Emiel Krahmer,et al.  Error Detection in Spoken Human-Machine Interaction , 2001, Int. J. Speech Technol..

[2]  Roland Siegwart,et al.  Visitor Flow Management using Human-Robot Interaction at Expo.02 , 2002 .

[3]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[4]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[5]  Gabriel Skantze Exploring Human Error Handling Strategies : Implications for Spoken Dialogue Systems , 2003 .

[6]  Eric Horvitz,et al.  A computational architecture for conversation , 1999 .

[7]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[8]  Vladimir Pavlovic,et al.  Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[9]  Christopher R. Brown,et al.  Dynamic Bayes net approach to multimodal sensor fusion , 1997, Other Conferences.

[10]  Roland Siegwart,et al.  Voice enabled interface for interactive tour-guide robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Gabriel Skantze,et al.  Exploring human error recovery strategies: Implications for spoken dialogue systems , 2005, Speech Communication.

[12]  Moshe Kam,et al.  Sensor Fusion for Mobile Robot Navigation , 1997, Proc. IEEE.

[13]  Markku Turunen,et al.  Agent-based error handling in spoken dialogue systems , 2001, INTERSPEECH.

[14]  Roland Siegwart,et al.  The interactive autonomous mobile system RoboX , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[16]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[17]  D G Bobrow,et al.  Applications of Artificial Intelligence , 1999 .

[18]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[19]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[20]  Illah R. Nourbakhsh,et al.  The History of the Mobot Museum Robot Series: An Evolutionary Study , 2001, FLAIRS.

[21]  Roland Siegwart,et al.  On developing a voice-enabled interface for interactive tour-guide robots , 2003, Adv. Robotics.

[22]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[23]  J. Thorpe,et al.  Data Fusion Algorithms for Collaborative Robotic Exploration , 2002 .

[24]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[25]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[26]  Janienke Sturm,et al.  Adding Extra Input/Output Modalities to a Spoken Dialogue System , 2001, SIGDIAL Workshop.

[27]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[28]  Plamen J. Prodanov,et al.  Bayesian networks for spoken dialogue management in multimodal systems of tour-guide robots , 2003, INTERSPEECH.

[29]  Clive Souter,et al.  Dialogue Management Systems: a Survey and Overview , 1997 .