Measuring the performance of isolated spoken Malay speech recognition using Multi-layer Neural Networks

This paper describes speech signal modeling techniques which are suited to high performance and robust isolated word recognition. In this study, a speech recognition system is presented, specifically an isolated spoken Malay word recognizer which uses spontaneous and formally speeches collected from Parliament of Malaysia. Currently the vocabulary is limited to 25 words that can be pronounced exactly as it written and controls the distribution of the vocalic segments. The speech segmentation task is achieved by adopted energy based parameter and zero crossing rate measure with modification to better locates the beginning and ending points of speech from the spoken words. The training and recognition processes are realized by using Multi-layer Perceptron (MLP) Neural Networks with two-layer network configurations that are trained with stochastic error back-propagation to adjust its weights and biases after presentation of every training data. The Mel-frequency Cepstral Coefficients (MFCCs) has been chosen as speech extraction approach from each segmented utterance as characteristic features for the word recognizer. Recognition results showed that the performance of the two-layer networks increased as the numbers of hidden neurons increased. The best network structures average classification rate is 84.731% with (150-25) configuration. Implementation results also showed that the conjugate gradient (CG) algorithm was more accurate and reliable than the Levenberg-Marquardt (LM) algorithm for the network complexities and data size considered in this study.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[3]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[4]  Tze Fen Li,et al.  Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra , 2007, ROCLING/IJCLCLP.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Syed Abdul Rahman Al-Haddad,et al.  Isolated Malay Digit Recognition Using Pattern Recognition Fusion of Dynamic Time Warping and Hidden Markov Models , 2008 .

[7]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[8]  J. Tebelskis,et al.  Speech recognition using neural networks , 1996 .

[9]  Tze Fen Li,et al.  Speech recognition of mandarin monosyllables , 2003, Pattern Recognit..

[10]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[11]  V. Skorpil,et al.  Back-Propagation and K-Means Algorithms Comparison , 2006, 2006 8th international Conference on Signal Processing.

[12]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[13]  He Qiang,et al.  On prefiltering and endpoint detection of speech signal , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[14]  K. Jusoff,et al.  Automatic Segmentation and Labeling for Spontaneous Standard Malay Speech Recognition , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[15]  Tan Lee,et al.  Cantonese syllable recognition using neural networks , 1999, IEEE Trans. Speech Audio Process..

[16]  Richard P. Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[17]  Martin T. Hagan,et al.  Neural network design , 1995 .

[18]  Hideo Mitsui,et al.  Method of deciding ANNs parameters for pattern recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[19]  Andrew Hunt,et al.  Recurrent neural networks for syllabification , 1993, Speech Commun..

[20]  F. Rosdi,et al.  Isolated malay speech recognition using Hidden Markov Models , 2008, 2008 International Conference on Computer and Communication Engineering.

[21]  Chee-Ming Ting,et al.  Application of Malay speech technology in Malay Speech Therapy Assistance Tools , 2007, 2007 International Conference on Intelligent and Advanced Systems.

[22]  Hsin-Min Wang,et al.  Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units , 1996, Speech Commun..

[23]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[24]  B. Ben Mosbah Speech Recognition for Disabilities People , 2006, 2006 2nd International Conference on Information & Communication Technologies.