A Robust Speech Communication into Smart Info-Media System

SUMMARY This paper introduces our developed noise robust speech communication techniques and describes its implementation to a smart info-media system, i.e., a small robot. Our designed speech communication system consists of automatic speech detection, recognition, and rejection. By using automatic speech detection and recognition, an observed speech waveform can be recognized without a manual trigger. In addition, using speech rejection, this system only accepts registered speech phrases and rejects any other words. In other words, although an arbitrary input speech waveform can be fed into this system and recognized, the system responds only to the registered speech phrases. The developed noise robust speech processing can reduce various noises in many environments. In addition to the design of noise robust speech recognition, the LSI design of this system has been introduced. By using the design of speech recognition application specific IC (ASIC), we can simultaneously realize low power consumption and real-time processing. This paper describes the LSI architecture of this system and its performances in some field experiments. In terms of current speech recognition accuracy, the system can realize 85–99% under 0–20 dB SNR and echo environments.

[1]  Fu-Hua Liu,et al.  Environmental adaptation for robust speech recognition , 1995 .

[2]  Yoshikazu Miyanaga,et al.  Spectrum filtering with FRM for robust speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  K Aikawa,et al.  Cepstral representation of speech motivated by time-frequency masking: an application to speech recognition. , 1996, The Journal of the Acoustical Society of America.

[5]  Kiyohiro Shikano,et al.  Recognition of noisy speech by composition of hidden Markov models , 1993, EUROSPEECH.

[6]  Mark J. F. Gales,et al.  Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[7]  Satoshi Nakamura,et al.  Robust speech recognition in car environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  S. Kay Noise compensation for autoregressive spectral estimates , 1980 .

[9]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Naoya Wada,et al.  Cepstral Amplitude Range Normalization for Noise Robust Speech Recognition , 2004, IEICE Trans. Inf. Syst..

[11]  Naoya Wada,et al.  Scalable architecture for word HMM-based speech recognition , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[12]  Noboru Hayasaka Running Spectrum Filtering in Speech Recognition , 2002 .

[13]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[14]  Yoshikazu Miyanaga,et al.  A Study of Robust Speech Recognition System and Its LSI Design , 2005 .

[15]  Shingo Yoshizawa,et al.  Direct control on modulation spectrum for noise-robust speech recognition and spectral subtraction , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[16]  Yoshikazu Miyanaga,et al.  Robust Recognition of Noisy Speech and its Hardware Design for Real Time Processing , 2005 .

[17]  Noboru Hayasa Running spectrum filtering in speech recognition , 2002 .

[18]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[19]  Naoya Wada,et al.  Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[20]  Misha Pavel,et al.  On the importance of various modulation frequencies for speech recognition , 1997, EUROSPEECH.

[21]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[22]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[23]  Norinobu Yoshida,et al.  A Low-Power LSI Design of Japanese Word Recognition System , 2002 .

[24]  J. Tierney,et al.  A study of LPC analysis of speech in additive noise , 1980 .

[25]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[26]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[27]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[28]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[29]  Biing-Hwang Juang,et al.  Signal bias removal for robust telephone based speech recognition in adverse environments , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Takao Watanabe,et al.  Rapid environment adaptation for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Juan Arturo Nolazco-Flores,et al.  Continuous speech recognition in noise using spectral subtraction and HMM adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.