Classification of vowel sounds using MFCC and feed forward Neural Network

The English language as spoken by Malaysians varies from place to place and differs from one ethnic community and its sub-group to another. Hence, it is necessary to develop an exclusive Speech to text translation system for understanding the English pronunciation as spoken by Malaysians. Speech translation is a process of both speech recognition and equivalent phonemic to word translation. Speech recognition is a process of identifying phonemes from the speech segment. In this paper, the initial step for speech recognition by identifying the phoneme features is proposed. In order to classify the phoneme features, Mel-frequency cepstral coefficients (MFCC) are computed in this paper. A simple feed forward Neural Network (FFNN) trained by back propagation procedure is proposed for identifying the phonemes features. The extracted MFCC coefficients are used as input to a neural network classifier for associating it to one of the 11 classes.

[1]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Hiroshi Matsumoto,et al.  Evaluation of mel-LPC cepstrum in a large vocabulary continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[5]  Christine Sénac,et al.  Hidden Markov models merging acoustic and articulatory information to automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[7]  Tomi Kinnunen,et al.  Spectral Features for Automatic Text-Independent Speaker Recognition , 2003 .

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[10]  José Lara A Method of Automatic Speaker Recognition Using Cepstral Features and Vectorial Quantization , 2005, CIARP.

[11]  S. Furui,et al.  Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication , 2000, Proceedings of the IEEE.

[12]  Douglas E. Sturim,et al.  Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  John W. Merrill,et al.  Automatic Speech Recognition , 2005 .