Automatic speech/speaker recognition in noisy environments using wavelet transform

Feature extraction represents a crucial step in pattern recognition in general and in speech/speaker recognition in particular. Robustness to most of the common types of noise is essential. This paper presents a discrete wavelet transform-based feature extraction technique for multi-band automatic speech/speaker recognition. Experimental results have shown that this technique is of comparable performance with a full-band (conventional) technique, under matched conditions (clean speech for both training and testing). It has been found that both techniques are complementary under mismatched conditions (clean speech for training and noisy speech for testing), in that if the features extracted using each of them are combined, better recognition rates are attainable especially at low signal-to-noise ratios.

[1]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[3]  Nikki Mirghafori,et al.  Transmissions and transitions: a study of two common assumptions in multi-band ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Weaam Alkhaldi,et al.  Multi-band based recognition of spoken Arabic numerals using wavelet transform , 2002, Proceedings of the Nineteenth National Radio Science Conference.

[5]  Steve Young,et al.  The HTK book , 1995 .

[6]  Hans-Günter Hirsch,et al.  Improved speech recognition using high-pass filtering of subband envelopes , 1991, EUROSPEECH.

[7]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.