Ear-model derived features for automatic speech recognition

The paper provides a theoretical justification that gravity centers (GC) in frequency bands computed from zero-crossing information are far more robust to additive telephone noise than GCs computed from FFT spectra. Experiments on two different corpora confirm the theoretical results when GCs are added to standard mel frequency-scaled cepstral coefficients (MFCC) and their time derivatives. A 20.1% word error reduction is observed on a large telephone corpus of Italian cities, with an average signal-to-noise ratio (SNR) of 15 dB, if GCs are computed from zero-crossings, while performance deteriorates when GCs are computed from FFT spectra.

[1]  Shuji Doshita,et al.  The Automatic Speech Recognition System for Conversational Sound , 1963, IEEE Trans. Electron. Comput..

[2]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[3]  R. De Mori,et al.  A descriptive technique for automatic speech recognition , 1973 .

[4]  Russell J. Niederjohn,et al.  A zero-crossing consistency method for formant tracking of voiced speech in high noise levels , 1985, IEEE Trans. Acoust. Speech Signal Process..

[5]  Steven M. Kay,et al.  A zero crossing-based spectrum analyzer , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  B. Kedem,et al.  Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[7]  T. V. Sreenivas,et al.  Zero-crossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise , 1992, IEEE Trans. Signal Process..

[8]  Don X. Sun Robust estimation of spectral center-of-gravity trajectories using mixture spline models , 1995, EUROSPEECH.

[9]  Roberto Gemello,et al.  Continuous speech recognition with neural networks and stationary-transitional acoustic units , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[10]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Renato De Mori,et al.  A study on the effect of adding new dimensions to trajectories in the acoustic space , 1999, EUROSPEECH.

[12]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..