Robust speech recognition using a voiced-unvoiced feature

In this paper, a voiced-unvoiced measure is used as acoustic feature for continuous speech recognition. The voiced-unvoiced measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis (LDA) to choose the most relevant features. Experiments were performed on the SieTill (German digit strings recorded over telephone line) and on the SPINE (English spontaneous speech under different simulated noisy environments) corpus. The additional voiced-unvoiced measure results in improvements in word error rate (WER) of up to 11% relative to using MFCC alone with the same overall number of parameters in the system.

[1]  Jing Huang,et al.  Multistage coarticulation model combining articulatory, formant and cepstral features , 2000, INTERSPEECH.

[2]  Liang Gu,et al.  Perceptual harmonic cepstral coefficients for speech recognition in noisy environment , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Partha Niyogi,et al.  Incorporating voice onset time to improve letter recognition accuracies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Andreas Spanias,et al.  Cepstrum-based pitch detection using a new statistical V/UV classification algorithm , 1999, IEEE Trans. Speech Audio Process..

[5]  Hong Kook Kim,et al.  Robust speech recognition techniques applied to a speech in noise task , 2001, INTERSPEECH.

[6]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[7]  Hermann Ney,et al.  Comparison of optimization methods for discriminative training criteria , 1997, EUROSPEECH.

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.