A high-performance auditory feature for robust speech recognition

An auditory feature extraction algorithm for robust speech recognition in adverse acoustic environments is proposed. Based on the analysis of human auditory system, the feature extraction algorithm consists of several modules: FFT, outer-middle-ear transfer function, frequency conversion from linear to Bark scales, auditory filtering, nonlinearity, and discrete cosine transform. Three recognition experiments have been conducted on connected digit recognition in wireless and land-line communications using handsets and handsfree microphones. Compared to LPCC and MFCC features, the proposed feature has shown 11% to 23% error-rate reductions on average in handset and hands-free acoustic environments in the experiments.

[1]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[2]  Wu Chou,et al.  Decision tree state tying based on segmental clustering for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[4]  V. Nedzelnitsky,et al.  Sound pressures in the basal turn of the cat cochlea. , 1980, The Journal of the Acoustical Society of America.

[5]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[6]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .