Extraction methods of voicing feature for robust speech recognition

In this paper, three different voicing features are studied as additional acoustic features for continuous speech recognition. The harmonic product spectrum based feature is extracted in frequency domain while the autocorrelation and the average magnitude difference based methods work in time domain. The algorithms produce a measure of voicing for each time frame. The voicing measure was combined with the standard Mel Frequency Cepstral Coefficients (MFCC) using linear discriminant analysis to choose the most relevant features. Experiments have been performed on small and large vocabulary tasks. The three different voicing measures combined with MFCCs resulted in similar improvements in word error rate: improvements of up to 14% on the smallvocabulary task and improvements of up to 6% on the largevocabulary task relative to using MFCC alone with the same overall number of parameters in the system.

[1]  Hermann Ney,et al.  Recent improvements of the RWTH large vocabulary speech recognition system on spontaneous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  David L. Thomson,et al.  Use of voicing features in HMM-based speech recognition , 2002, Speech Commun..

[4]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[5]  Andrej Ljolje Speech recognition using fundamental frequency and voicing in acoustic modeling , 2002, INTERSPEECH.

[6]  Hermann Ney,et al.  Comparison of optimization methods for discriminative training criteria , 1997, EUROSPEECH.