Non-stationary feature extraction for automatic speech recognition

In current speech recognition systems mainly Short-Time Fourier Transform based features like MFCC are applied. Dropping the short-time stationarity assumption of the voiced speech, this paper introduces the non-stationary signal analysis into the ASR framework. We present new acoustic features extracted by a pitch-adaptive Gammatone filter bank. The noise robustness was proved on AURORA 2 and 4 tasks, where the proposed features outperform the standard MFCC. Furthermore, successful combination experiments via ROVER indicate the differences between the new features and MFCC.

[1]  Abeer Alwan,et al.  Non-linear feature extraction for robust speech recognition in stationary and non-stationary noise , 2003, Comput. Speech Lang..

[2]  Hermann Ney,et al.  Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Friedhelm R. Drepper A Two-Level Drive - Response Model of Non-stationary Speech Signals , 2005, NOLISP.

[4]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[5]  Alex Acero,et al.  A harmonic-model-based front end for robust speech recognition , 2003, INTERSPEECH.

[6]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  R. Schlüter,et al.  Non-stationary acoustic objects as atoms of voiced speech , 2008 .

[8]  E. de Boer,et al.  Synthetic whole-nerve action potentials for the cat. , 1975, The Journal of the Acoustical Society of America.

[9]  Steve Renals,et al.  Pitch adaptive features for LVCSR , 2008, INTERSPEECH.

[10]  S. McAdams Segregation of concurrent sounds. I: Effects of frequency modulation coherence. , 1989, The Journal of the Acoustical Society of America.

[11]  Richard F. Lyon,et al.  Introducing the Differentiated All-Pole and One-Zero Gammatone Filter Responses and their Analog VLSI Log-domain Implementation , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.