Efficient MLP constructive training algorithm using a neuron recruiting approach for isolated word recognition system

This paper describes an efficient constructive training algorithm using a Multi Layer Perceptron (MLP) neural network dedicated for Isolated Word Recognition (IWR) systems. Incremental training procedure was employed and this approach was based on novel hidden neurons recruiting for a single hidden-layer. During Neural Network (NN) training phase, the number of pronunciation samples extracted from the Training Data (TD) was sequentially increased. Optimal structure of the NN classifier with optimized TD size was obtained using this proposed MLP constructive training algorithm.Isolated word recognition system based on MLP neural network was then constructed and tested for recognizing ten words extracted from TIMIT database. Mel Frequency Cepstral Coefficient (MFCC) feature extraction method was employed including energy, first and second derivative coefficients.A proposed Frame-by-Frame Neural Network (FFNN) classification method was explored and compared with the Conventional Neural Network (CNN) classification approach. Principal Component Analysis (PCA) technique was also investigated in order to reduce both TD size as well as recognition system complexity.Experimental results showed superior performance of the proposed FFNN classifier compared to the CNN counter part which was illustrated by the significant improvement obtained in terms of recognition rate.

[1]  Anthony Kuh,et al.  A combined self-organizing feature map and multilayer perceptron for isolated word recognition , 1992, IEEE Trans. Signal Process..

[2]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  R.P. Lippmann,et al.  Pattern classification using neural networks , 1989, IEEE Communications Magazine.

[4]  Chia-Feng Juang,et al.  Hierarchical Singleton-Type Recurrent Neural Fuzzy Networks for Noisy Speech Recognition , 2007, IEEE Transactions on Neural Networks.

[5]  P.S. Sathidevi,et al.  Auditory-Based Wavelet Packet Filterbank for Speech Recognition Using Neural Network , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[6]  Kurt Hornik,et al.  FEED FORWARD NETWORKS ARE UNIVERSAL APPROXIMATORS , 1989 .

[7]  Hsiao-Chuan Wang,et al.  A study on adaptations of cepstral and delta cepstral coefficients for noisy speech recognition , 1994, ICSLP.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Mohamed Chtourou,et al.  Isolated word recognition system using MLP neural network constructive training algorithm , 2009, 2009 6th International Multi-Conference on Systems, Signals and Devices.

[11]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[12]  Hervé Bourlard,et al.  Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions , 1997, Summer School on Neural Networks.

[13]  Derong Liu,et al.  A constructive algorithm for feedforward neural networks with incremental training , 2002 .

[14]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Ke Chen,et al.  Capture inter-speaker information with a neural network for speaker identification , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[16]  Hervé Bourlard,et al.  Neural networks for statistical recognition of continuous speech , 1995, Proc. IEEE.

[17]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[18]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Lai-Wan Chan,et al.  Isolated word recognition using modular recurrent neural networks , 1998, Pattern Recognit..

[20]  Dirk Van Compernolle,et al.  Dual stream speech recognition using articulatory syllable models , 2010, Int. J. Speech Technol..

[21]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[22]  Chee Peng Lim,et al.  Speech recognition using artificial neural networks , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[25]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[26]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27]  Sukumar Nandi,et al.  Priority Based Fairness Provisioning QoS-Aware MAC Protocol , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[28]  Ke Chen,et al.  Capture interspeaker information with a neural network for speaker identification , 2002, IEEE Trans. Neural Networks.

[29]  Sam Kwong,et al.  Genetic algorithm based simultaneous optimization of feature subsets and hidden Markov model parameters for discrimination between speech and non-speech events , 2010, Int. J. Speech Technol..

[30]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  Aurobinda Routray,et al.  Vocal emotion recognition in five native languages of Assam using new wavelet features , 2009, Int. J. Speech Technol..