Optimizing feature extraction for speech recognition

We propose a method to minimize the loss of information during the feature extraction stage in speech recognition by optimizing the parameters of the mel-cepstrum transformation, a transform which is widely used in speech recognition. Typically, the mel-cepstrum is obtained by critical band filters whose characteristics play an important role in converting a speech signal into a sequence of vectors. First, we analyze the performance of the mel-cepstrum by changing the parameters of the filters such as shape, center frequency, and bandwidth. Then we propose an algorithm to optimize the parameters of the filters using the simplex method. Experiments with Korean digit words show that the recognition rate improved by about 4-7%.

[1]  Roberto Togneri,et al.  Phoneme-based vector quantization in a discrete HMM speech recognizer , 1997, IEEE Trans. Speech Audio Process..

[2]  Sungwook Chang,et al.  Speech feature extracted from adaptive wavelet for speech recognition , 1998 .

[3]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[4]  P. Turner,et al.  Numerical methods and analysis , 1992 .

[5]  K. Wang,et al.  Auditory analysis of spectro-temporal information in acoustic signals , 1995 .

[6]  B. Juang,et al.  Selective feature extraction via signal decomposition , 1997, IEEE Signal Process. Lett..

[7]  Alain Biem,et al.  Filter bank design based on discriminative feature extraction , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  H.F. Silverman,et al.  Analysis of LPC/DFT features for an HMM-based alphadigit recognizer , 1996, IEEE Signal Processing Letters.

[10]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[11]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[12]  Stephen A. Zahorian,et al.  A partitioned neural network approach for vowel classification using smoothed time/frequency features , 1999, IEEE Trans. Speech Audio Process..

[13]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[14]  Jenq-Neng Hwang,et al.  Robust speech recognition based on joint model and feature space optimization of hidden Markov models , 1997, IEEE Trans. Neural Networks.

[15]  Li Deng,et al.  HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features , 1997, IEEE Trans. Speech Audio Process..

[16]  Alain Biem,et al.  Cepstrum-based filter-bank design using discriminative feature extraction training at various levels , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  William H. Press,et al.  Numerical recipes , 1990 .

[18]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Liang Gu,et al.  The application of optimization in feature extraction of speech recognition , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).