Experiments on a parametric nonlinear spectral warping for an HMM-based speech recognizer

This paper is concerned with the search for an optimal feature-set for a speech recognition system. A better acoustic feature analysis that suitably enhances the semantic information in a consistent fashion can reduce raw-score (no grammar) error rate significantly. A simple two-dimensional parameterized feature-set is proposed. The feature-set is compared against a standard mel-cepstrum, LPC-based feature-set in talker-independent, connected-alphadigit HMM-based recognizer. The results show that a particular combination of parameters yields a significantly lower error rate than the baseline mel-cepstrum LPC-based feature-set.

[1]  Xerox Corpora,et al.  Speech Recognition Experiments with Linear Predication, Bandpass Filtering, and Dynamic Programming , 1975 .

[2]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Richard M. Stern,et al.  Multiple Approaches to Robust Speech Recognition , 1992, HLT.

[4]  Steve Young,et al.  Time-frequency spectral estimation of speech , 1992 .

[5]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[6]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[7]  Les E. Atlas,et al.  Applications of positive time-frequency distributions to speech processing , 1994, IEEE Trans. Speech Audio Process..

[8]  Harvey F. Silverman,et al.  Microphone-array speech recognition via incremental map training , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Biing-Hwang Juang,et al.  Speech recognition in adverse environments , 1991 .

[11]  H.F. Silverman,et al.  Analysis of LPC/DFT features for an HMM-based alphadigit recognizer , 1996, IEEE Signal Processing Letters.

[12]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[13]  Harvey F. Silverman,et al.  A parametrically controlled spectral analysis system for speech , 1974 .

[14]  N. Dixon,et al.  A comparison of several speech-spectra classification methods , 1976 .