Adaptive robust speech processing based on acoustic noise estimation and classification

The paper presents an adaptive system for speech signal processing in the presence of loud background noise. The validity of the approach is confirmed by implementing a classification system for voiced and unvoiced (V/UV) speech frames. Genetic algorithms were used to select the parameters that offer the best V/UV classification in the presence of 4 different types of background noise and with 5 different SNRs. 20 neural network-based classification systems were then implemented, chosen dynamically frame by frame according to the output of a background noise recognition system and an SNR estimation system. The system was implemented and the tests performed using the TIMIT speech corpus and its phonetic classification. The results were compared with a non-adaptive classification system and the 3 V/UV detectors adopted by three important: LPClO, ITU-T G. 723.1 and ETSI AMR. In all cases the adaptive V/UV classifier clearly outperformed the others, confirming the validity of the adaptive approach

[1]  Olivier Siohan,et al.  Sequential estimation with optimal forgetting for robust speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Joseph P. Campbell,et al.  Voiced/Unvoiced classification of speech with applications to the U.S. government LPC-10E algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Wolfgang J. Hess,et al.  Pitch and voicing determination , 1992 .

[5]  John H. L. Hansen,et al.  Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[8]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[9]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[10]  E. Jafer,et al.  Wavelet-based voiced/unvoiced classification algorithm , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[11]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[12]  Francesco Beritelli A modified CS-ACELP algorithm for variable-rate speech coding robust in noisy environments , 1999, IEEE Signal Processing Letters.

[13]  Li Deng,et al.  Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise , 2004, IEEE Transactions on Speech and Audio Processing.

[14]  Peter Kabal,et al.  Frame level noise classification in mobile environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Francesco Beritelli,et al.  A low-complexity speech-pause detection algorithm for communication in noisy environments , 2004, Eur. Trans. Telecommun..

[16]  Donald G. Childers,et al.  Silent and voiced/unvoiced/mixed excitation (four-way) classification of speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[18]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..