Automatic speech recognition based on cepstral coefficients and a mel-based discrete energy operator

In this paper, a novel feature vector based on both mel frequency cepstral coefficients (MFCCs) and a mel-based nonlinear discrete-time energy operator (MDEO) is proposed to be used as the input of an HMM-based automatic continuous speech recognition (ACSR) system. Our goal is to improve the performance of such a recognizer using the new feature vector. Experiments show that the use of the new feature vector increases the recognition rate of the ACSR system. The HTK hidden Markov model toolkit was used throughout. Experiments were done on both the TIMIT and NTIMIT databases. For the TIMIT database, when the MDEO was included in the feature vector to test a multi-speaker ACSR system, we found that the error rate decreased by about 9.51%. On the other hand, for NTIMIT, the MDEO deteriorates the performance of the recognizer. That is, the new feature vector is useful for clean speech but not for telephone speech.

[1]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[3]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[6]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[7]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.