HMM and BPNN based speech recognition system for home service robot

This paper proposes a two-stage speech recognition system based on hidden Markov model (HMM) and back-propagation neural network (BPNN) for home service robot. Since a home service robot would interact with different users, a speaker independent and robust system should be developed. The recognition system we proposed contains two learning stages to build the models of words. In the first stage, the Gaussian mixture model (GMM) likelihood probabilities are calculated by HMM. And then, the probabilities are treated as the input units of neural network in the second stage. The home service robot, May-1 is designed and implemented for realizing the speech recognition system. The experimental results show that the robot can successfully complete follow-me, recognition of names, and recognition of rooms tasks in the RoboCup@ Home league competition.

[1]  Chin-Hui Lee,et al.  Automatic recognition of keywords in unconstrained speech using hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2]  Hyunsoo Yoon,et al.  Application of fully recurrent neural networks for speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hong Kim,et al.  A name recognition based call-and-come service for home robots , 2008, IEEE Transactions on Consumer Electronics.

[4]  P. Mills,et al.  Fuzzy logic enhanced symmetric dynamic programming for speech recognition , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[5]  Sung-Suk Kim,et al.  Sound source localization with the aid of excitation source information in home robot environments , 2008, IEEE Transactions on Consumer Electronics.

[6]  Hyunsoo Kim,et al.  Sound source localization for robot auditory systems , 2009, IEEE Transactions on Consumer Electronics.

[7]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[8]  São Luís Castro,et al.  Recognizing emotions in spoken language: A validated set of Portuguese sentences and pseudosentences for research on emotional prosody , 2010, Behavior research methods.

[9]  Khalid Saeed,et al.  A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image , 2007, IEEE Transactions on Industrial Electronics.

[10]  Abderrahmane Amrouche,et al.  An efficient speech recognition system in adverse conditions using the nonparametric regression , 2010, Eng. Appl. Artif. Intell..

[11]  M. A. Anusuya,et al.  Speech Recognition by Machine, A Review , 2010, ArXiv.

[12]  Mohamed S. Kamel,et al.  Natural language understanding through fuzzy logic inference and its application to speech recognition , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[13]  Ronald A. Cole,et al.  Real-world speech recognition with neural networks , 1995, SPIE Defense + Commercial Sensing.

[14]  Grzegorz Cielniak,et al.  Active people recognition using thermal and grey images on a mobile security robot , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Lai-Wan Chan,et al.  Recurrent neural networks for speech modeling and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Francesco Beritelli,et al.  A fuzzy logic-based speech detection algorithm for communications in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Takayuki Kanda,et al.  An affective guide robot in a shopping mall , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Jeong-Sik Park,et al.  Feature vector classification based speech emotion recognition for service robots , 2009, IEEE Transactions on Consumer Electronics.

[19]  Jean Rouat,et al.  Robust Recognition of Simultaneous Speech by a Mobile Robot , 2007, IEEE Transactions on Robotics.

[20]  John H. L. Hansen,et al.  Discriminative In-Set/Out-of-Set Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Oscar Saz-Torralba,et al.  Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  S. R. Mahadeva Prasanna,et al.  Speaker localization using excitation source information in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Jun Cai,et al.  Efficient likelihood evaluation and dynamic Gaussian selection for HMM-based speech recognition , 2009, Comput. Speech Lang..

[24]  Christiaan J. J. Paredis,et al.  Interactive Multimodal Robot Programming , 2005, Int. J. Robotics Res..

[25]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..