Incorporating the voicing information into HMM-based automatic speech recognition

In this paper, we propose a novel model for incorporating the voicing information in a speech recognition system. The voicing information employed is estimated by a novel method that can provide this information for each filter-bank channel, without requiring any information about the fundamental frequency. A Viterbi-style training procedure is employed to estimate the voicing-probability of each mixture at each HMM state. Experiments are performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved at low SNRs when the voicing information is incorporated within the standard model and two models that had already compensated for the effect of the noise.

[1]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[2]  Douglas D. O'Shaughnessy,et al.  Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[4]  Andreas Stolcke,et al.  Voicing feature integration in SRI's decipher LVCSR system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[7]  Seiichi Nakagawa,et al.  Speaker independent speech recognition using features based on glottal sound source , 2002, INTERSPEECH.

[8]  Martin J. Russell,et al.  Covariation and weighting of harmonically decomposed streams for ASR , 2003, INTERSPEECH.

[9]  Andrej Ljolje Speech recognition using fundamental frequency and voicing in acoustic modeling , 2002, INTERSPEECH.

[10]  Peter Jancovic,et al.  Estimation of Voicing-Character of Speech Spectra Based on Spectral Shape , 2007, IEEE Signal Processing Letters.

[11]  David L. Thomson,et al.  Use of voicing features in HMM-based speech recognition , 2002, Speech Commun..

[12]  Peter Jancovic,et al.  Voicing-Character Estimation of Speech Spectra: Application to Noise Robust Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Peter Jancovic,et al.  Combining the union model and missing feature method to improve noise robustness in ASR , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Hermann Ney,et al.  Extraction methods of voicing feature for robust speech recognition , 2003, INTERSPEECH.