Spoken Word Recognition Using Hidden Markov Model

The main aim of this project is to develop isolated spoken word recognition system using Hidden Markov Model (HMM) with a good accuracy at all the possible frequency range of human voice. Here ten different words are recorded by different speakers including male and female and results are compared with different feature extraction methods. Earlier work includes recognition of seven small utterances using HMM with the use only one feature extraction method. This spoken word recognition system mainly divided into two major blocks. First includes recording data base and feature extraction of recorded signals. Here we use Mel frequency cepstral coefficients, linear cepstral coefficients and fundamental frequency as feature extraction methods. To obtain Mel frequency cepstral coefficients signal should go through the following: pre emphasis, framing, applying window function, Fast Fourier transform, filter bank and then discrete cosine transform, where as a linear frequency cepstral coefficients does not use Mel frequency. Second part describes HMM used for modeling and recognizing the spoken words. All the raining samples are clustered using K-means algorithm. Gaussian mixture containing mean, variance and weight are modeling parameters. Here Baum Welch algorithm is used for training the samples and re-estimate the parameters. Finally Viterbi algorithm recognizes best sequence that exactly matches for given sequence there is given spoken utterance to be recognized. Here all the simulations are done by the MATLAB tool and Microsoft window 7 operating system.

[1]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[2]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  T. Hughes,et al.  Signals and systems , 2006, Genome Biology.

[5]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[6]  Wu Chou,et al.  Discriminant-function-based minimum recognition error rate pattern-recognition approach to speech recognition , 2000, Proceedings of the IEEE.

[7]  Hossein Sameti,et al.  A novel approach to HMM-based speech recognition system using particle swarm optimization , 2009, 2009 Fourth International on Conference on Bio-Inspired Computing.

[8]  John H. L. Hansen,et al.  Speaker identification with whispered speech based on modified LFCC parameters and feature mapping , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Laimutis Telksnys,et al.  Development of Isolated Word Speech Recognition System , 2002, Informatica.

[10]  O.O. Khalifa,et al.  Human computer interaction using isolated-words speech recognition technology , 2007, 2007 International Conference on Intelligent and Advanced Systems.

[11]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[14]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[15]  Keiichi Tokuda,et al.  A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction , 2001, Speech Commun..

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[18]  Tomi Kinnunen,et al.  Multitaper Estimation of Frequency-Warped Cepstra With Application to Speaker Verification , 2010, IEEE Signal Processing Letters.

[19]  Hynek Hermansky,et al.  Recognition of Reverberant Speech Using Frequency Domain Linear Prediction , 2008, IEEE Signal Processing Letters.

[20]  Eduardo López Gonzalo,et al.  Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition , 2009, INTERSPEECH.

[21]  Nam Soo Kim,et al.  Maximum a posteriori adaptation of HMM parameters based on speaker space projection , 2004, Speech Commun..

[22]  Dimo Dimov,et al.  Experimental specifics of using HMM in isolated word speech recognition ( , 2005 .

[23]  F. Rosdi,et al.  Isolated malay speech recognition using Hidden Markov Models , 2008, 2008 International Conference on Computer and Communication Engineering.