INTEGRATION OF A VOICE RECOGNITION SYSTEM IN A SOCIAL ROBOT

Human–robot interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialogue systems and interaction by voice play an important role. When speaking about human–robot natural dialogue we assume that the robot has the capability to accurately recognize what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this article we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed at the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters such as user age, gender, intonation, volume, and language. Finally, we propose a new model to classify recognition results as accepted or rejected, based on a second automatic speech recognition (ASR) opinion. This new approach takes into account the precalculated success rate in noise intervals for each recognition framework, decreasing the rate of false positives and false negatives. 1HRI can be defined as the study of humans, robots, and the ways in which they influence each other.

[1]  Cynthia Breazeal,et al.  Emotive qualities in lip-synchronized robot speech , 2003, Adv. Robotics.

[2]  Kikuo Fujimura,et al.  The intelligent ASIMO: system overview and integration , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Michael Pucher,et al.  Architecture for adaptive multimodal dialog systems based on voiceXML , 2001, INTERSPEECH.

[4]  Tetsuya Ogata,et al.  Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Volker Graefe,et al.  HERMES - a versatile personal robotic assistant , 2004, Proceedings of the IEEE.

[6]  Magdalena D. Bugajska,et al.  An Agent Driven Human-centric Interface for Autonomous Mobile Robots , 2003 .

[7]  Georges Linarès,et al.  Semantic cache model driven speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  S.W.K. Chan,et al.  Inferences in natural language understanding , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[9]  Alexander I. Rudnicky,et al.  Building voiceXML-based applications , 2002, INTERSPEECH.

[10]  Vishu R. Viswanathan,et al.  Hands-free voice communication in an automobile with a microphone array , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Gernot A. Fink,et al.  A multi-modal dialog system for a mobile robot , 2004, INTERSPEECH.

[12]  John Fry,et al.  Natural dialogue with the Jijo-2 office robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[13]  Satoshi Nakamura,et al.  Robust Speech Recognition System for Communication Robots in Real Environments , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[14]  Kenzo Akagiri,et al.  Development of zonal beamformer and its application to robot audition , 2010, 2010 18th European Signal Processing Conference.

[15]  Thomas Hellström,et al.  VOICE USER INTERFACE IN ROBOTICS-COMMON ISSUES AND PROBLEMS , 2007 .

[16]  Munsang Kim,et al.  Human-Robot Interaction in Real Environments by Audio-Visual Integration , 2007 .

[17]  Mei-Yuh Hwang,et al.  Improved acoustic modeling with the SPHINX speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[18]  D. Sofge,et al.  Human-Robot Collaboration and Cognition with an Autonomous Mobile Robot , 2003 .

[19]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[20]  Peter Wallis,et al.  A Robot in the Kitchen , 2010 .

[21]  María Malfaz,et al.  Multimodal Human-Robot Interaction Framework for a Personal Robot , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[22]  Tony Belpaeme,et al.  A computational model of intention reading in imitation , 2006, Robotics Auton. Syst..

[23]  Joelle Pineau,et al.  Spoken Dialog Management for Robots , 2000, ACL 2000.

[24]  Feng Lin,et al.  Computing confidence score of any input phrases for a spoken dialog system , 2010, 2010 IEEE Spoken Language Technology Workshop.

[25]  Teruko Mitamura,et al.  DialogXML: extending VoiceXML for dynamic dialog management , 2002 .

[26]  Miguel A. Salichs,et al.  Robot skill abstraction for AD architecture , 2007 .

[27]  Hiroshi G. Okuno,et al.  An open source software system for robot audition HARK and its evaluation , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[28]  Dilek Z. Hakkani-Tür,et al.  Evaluation of semantic role labeling and dependency parsing of automatic speech recognition output , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Tetsuya Ogata,et al.  Improvement in listening capability for humanoid robot HRP-2 , 2010, 2010 IEEE International Conference on Robotics and Automation.

[30]  Hiroaki Kitano,et al.  Active Audition for Humanoid , 2000, AAAI/IAAI.

[31]  Hiroshi Ishiguro,et al.  Robovie-IV: A Communication Robot Interacting with People Daily in an Office , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Junlan Feng A general framework for building natural language understanding modules in voice search , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Bingru Yang,et al.  Intelligent Decision Support System Based on Natural Language Understanding , 2009, 2009 International Conference on Management and Service Science.

[34]  Pietro Laface,et al.  Loquendo - Politecnico di Torino's 2008 NIST speaker recognition evaluation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Noboru Ohnishi,et al.  Building ears for robots: Sound localization and separation , 1997, Artificial Life and Robotics.

[36]  Gerhard Sagerer,et al.  Understanding Social Robots , 2009, 2009 Second International Conferences on Advances in Computer-Human Interactions.

[37]  Tony Belpaeme,et al.  Beyond the individual: new insights on language, cognition and robots , 2008, Connect. Sci..

[38]  Li Junjie,et al.  Natural language understanding based on background knowledge , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[39]  Hiroshi Mizoguchi,et al.  Three ring microphone array for 3D sound localization and separation for mobile robot audition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Ramon Barber,et al.  A New Human Based Architecture for Intelligent Autonomous Robots , 2001 .

[41]  Gary Geunbae Lee,et al.  Improving Speech Recognition Using Semantic and Reference Features in a Multimodal Dialog System , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[42]  Jean Rouat,et al.  Enhanced robot audition based on microphone array source separation with post-filter , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[43]  Hiroshi G. Okuno,et al.  Automatic speech recognition improved by two-layered audio-visual integration for robot audition , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[44]  Donald E. Walker,et al.  Speech Understanding Through Syntactic and Semantic Analysis , 1973, IEEE Transactions on Computers.

[45]  Francisco J. Valverde-Albacete,et al.  A multi-level lexical-semantics based language model design for guided integrated continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.