Data-driven design of front-end filter bank for Lombard speech recognition

Adverse environments not only corrupt speech signal by additive and convolutional noises, which can be successfully addressed by a number of suppression algorithms, but also affect the way how speech is produced. Speech production variations introduced by a speaker in reaction to a noisy background (Lombard effect) may result in a severe degradation of automatic speech recognition. This paper contributes to the solution of Lombard speech recognition issue by providing a robust filter bank for use in front-ends. It is shown that cepstral features derived from the proposed filter bank significantly outperform conventional cepstral features.

[1]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[2]  Stephanie Seneff,et al.  A computational model for the peripheral auditory system: Application of speech recognition research , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  Tomi Kinnunen,et al.  Designing a speaker-discriminative adaptive filter bank for speaker recognition , 2002, INTERSPEECH.

[5]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[6]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[7]  Mark D Skowronski,et al.  Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. , 2004, The Journal of the Acoustical Society of America.

[8]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[9]  Alain Biem,et al.  Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Petr Pollák,et al.  Design and collection of Czech Lombard speech database , 2005, INTERSPEECH.

[11]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[12]  Petr Fousek,et al.  Additive noise and channel distortion-robust parametrization tool - performance evaluation on Aurora 2 & 3 , 2003, INTERSPEECH.