Speaker Independent Word Recognition Using Cepstral Distance Measurement

Speech recognition has been developed from theoretical methods practical systems. Since 90’s people have moved their interests to the difficult task of Large Vocabulary Continuous Speech Recognition (LVCSR) and indeed achieved a great progress. Meanwhile, many well-known research and commercial institutes have established their recognition systems including via Voice system IBM, Whisper system by Microsoft etc. In this paper we have developed a simple and efficient algorithm for the recognition of speech signal for speaker independent isolated word recognition system. We use Mel frequency cepstral coefficients (MFCCs) as features of the recorded speech. A decoding algorithm is proposed for recognizing the target speech computing the cepstral distance of the cepstral coefficients. Simulation experiments were carried using MATLAB here the method produced relatively good (85% word recognition accuracy) results.

[1]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[2]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[3]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[4]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[7]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[8]  Juergen Luettin,et al.  Audio-visual speech recognition , 2000 .

[9]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[10]  Juergen Luettin,et al.  Asynchronous stream modeling for large vocabulary audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[13]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[15]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[16]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..