Enhanced Running Spectrum Analysis for Robust Speech Recognition Under Adverse Conditions: A Case Study on Japanese Speech

In any real environment, noises degrade the performance of Automatic Speech Recognition (ASR) systems. Additionally, in the case of similar pronunciations, it is not easy to realize a high accuracy of recognition. From  this point of view, our work envisions an enhanced algorithm processing a speech modulation spectrum, such as Running Spectrum Analysis (RSA). It was also adequately applied to observed speech data. In the envisioned method, a modulation spectrum filtering (MSF) method directly modified the observed cepstral modulation spectrum by a Fourier transform of the cepstral time frequency. The method and experiments carried out for various passbands had favorable results that showed an improvement of about 1-4 % in recognition accuracycompared to conventional methods.

[1]  Petros Maragos,et al.  Modulation features for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Y. Miyanaga,et al.  The effect of the musical noise suppression in speech noise reduction using RSF , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[3]  Thaweesak Yingthawornsuk,et al.  Speech Recognition using MFCC , 2012 .

[4]  Hynek Hermansky,et al.  Speech enhancement based on temporal processing , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Nidhika Birla,et al.  VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW , 2010 .

[6]  Y. Miyanaga,et al.  Noise Robust Speech Features for Automatic Continuous Speech Recognition using Running Spectrum Analysis , 2008, 2008 International Symposium on Communications and Information Technologies.

[7]  Mohammad Hossein Moattar,et al.  A simple but efficient real-time Voice Activity Detection algorithm , 2009, 2009 17th European Signal Processing Conference.

[8]  P. B. Patil Multilayered network for LPC based speech recognition , 1998 .

[9]  Hervé Bourlard,et al.  Mel-cepstrum modulation spectrum (MCMS) features for robust ASR , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).