A Robust Speech Recognition System for Communication Robots in Noisy Environments

The application range of communication robots could be widely expanded by the use of automatic speech recognition (ASR) systems with improved robustness for noise and for speakers of different ages. In past researches, several modules have been proposed and evaluated for improving the robustness of ASR systems in noisy environments. However, this performance might be degraded when applied to robots, due to problems caused by distant speech and the robot's own noise. In this paper, we implemented the individual modules in a humanoid robot, and evaluated the ASR performance in a real-world noisy environment for adults' and children's speech. The performance of each module was verified by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80% word accuracy in 70-dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73% word accuracy.

[1]  Takayuki Kanda,et al.  Interactive Humanoid Robots for a Science Museum , 2007, IEEE Intell. Syst..

[2]  Maurizio Omologo,et al.  Speech Recognition with Microphone Arrays , 2001, Microphone Arrays.

[3]  K. Nakadai,et al.  Real-Time Auditory and Visual Multiple-Object Tracking for Robots , 2001, IJCAI 2001.

[4]  Kiyohiro Shikano,et al.  Blind sound scene decomposition for robot audition using SIMO-model-based ICA , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Satoshi Nakamura,et al.  Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion , 2004, IEICE Trans. Inf. Syst..

[6]  F. K. Soong Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[7]  Speech and Language Databases for Speech Translation Research in ATR , Toshiyuki Takezawa , .

[8]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[9]  Takayuki Kanda,et al.  Three-layered draw-attention model for humanoid robots with gestures and verbal cues , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Kiyohiro Shikano,et al.  Noise-robust hands-free speech recognition based on spatial subtraction array and known noise superimposition , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  T. Horiuchi,et al.  Hands-free speech recognition and communication on PDAs using microphone array technology , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12]  S. Haykin Unsupervised adaptive filtering, vol. 1: Blind source separation , 2000 .

[13]  Wolfgang Herbordt,et al.  Application of a double-talk resilient DFT domain adaptive filter for bin-wise stepsize controls to adaptive beamforming , 2005 .

[14]  Hiroshi Ishiguro,et al.  Evaluation of Prosodic and Voice Quality Features on Automatic Extraction of Paralinguistic Information , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Jean Rouat,et al.  Making a robot recognize three simultaneous sentences in real-time , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Takayuki Kanda,et al.  Interactive Robots as Social Partners and Peer Tutors for Children: A Field Trial , 2004, Hum. Comput. Interact..

[17]  Kazuya Takeda,et al.  Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[18]  Satoshi Nakamura,et al.  ATR Parallel Decoding Based Speech Recognition System Robust to Noise and Speaking Styles , 2006, IEICE Trans. Inf. Syst..

[19]  Hiroaki Kitano,et al.  Applying scattering theory to robot audition system: robust sound source localization and extraction , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[20]  Satoru Hayamizu,et al.  Socially Embedded Learning of the Office-Conversant Mobil Robot Jijo-2 , 1997, IJCAI.