Dynamic programming prediction errors of recurrent neural fuzzy networks for speech recognition

This paper proposes Mandarin phrase recognition using dynamic programming (DP) prediction errors of singleton-type recurrent neural fuzzy networks (SRNFNs). This method is called DP-SRNFN. The recurrent property of SRNFN makes it suitable for processing temporal speech patterns. A Mandarin phrase comprises monosyllabic words. SRNFN training is based on the word unit. There are N"w SRNFNs for modeling N"w words, and each SRNFN receives the current frame feature and predicts the next one of its modeling word. In recognizing N"P phrases, the prediction error of each trained SRNFN is computed, and DP is used to find the optimal path that maps the input frames to the best matched SRNFNs (words) for each of the N"P phrases. The accumulated error of each phrase model is computed from its optimal path and the one with the minimum error is the recognition result. To verify DP-SRNFN performance, this study conducted experiments on recognizing 30 Mandarin phrases. SRNFN training with noisy features for phrase recognition under different noisy environments was also conducted. DP-SRNFN performance is compared with the hidden Markov models (HMMs). Results show that DP-SRNFN achieves higher recognition rates than HMM in both clean and noisy environments.

[1]  Paris A. Mastorocostas,et al.  A recurrent fuzzy-neural model for dynamic system identification , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[2]  Qiang Huo,et al.  A Study of Minimum Classification Error (MCE) Linear Regression for Supervised Adaptation of MCE-Trained Continuous-Density Hidden Markov Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Chia-Feng Juang,et al.  Hierarchical Singleton-Type Recurrent Neural Fuzzy Networks for Noisy Speech Recognition , 2007, IEEE Transactions on Neural Networks.

[4]  D. J. Brens,et al.  Dialog design for automatic speech recognition of telephone numbers and account numbers , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[5]  Abdelkader Benyettou,et al.  Continuous speech recognition by adaptive temporal radial basis function , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[6]  A.M. Ahmad,et al.  Recurrent neural network with backpropagation through time for speech recognition , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Chia-Feng Juang,et al.  A TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithms , 2002, IEEE Trans. Fuzzy Syst..

[9]  M. J. Hunt An examination of three classes of ASR dialogue systems: PC-based dictation, in-car systems and automated directory assistance , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[10]  O. Viikki,et al.  ASR in portable wireless devices , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  Sam Kwong,et al.  Adaptation of hidden Markov models using maximum model distance algorithm , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[13]  Jia Zeng,et al.  Type-2 fuzzy hidden Markov models and their application to speech recognition , 2006, IEEE Transactions on Fuzzy Systems.

[14]  Bojan Petek,et al.  On the predictive connectionist models for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Jie Zhang,et al.  Recurrent neuro-fuzzy networks for nonlinear process modeling , 1999, IEEE Trans. Neural Networks.

[18]  Cheng-Jian Lin,et al.  Identification and prediction using recurrent compensatory neuro-fuzzy systems , 2005, Fuzzy Sets Syst..

[19]  Chin-Teng Lin,et al.  A robust word boundary detection algorithm for variable noise-level environment in cars , 2002, IEEE Trans. Intell. Transp. Syst..

[20]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[21]  Ioannis B. Theocharis A high-order recurrent neuro-fuzzy system with internal dynamics: Application to the adaptive noise cancellation , 2006, Fuzzy Sets Syst..

[22]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[24]  Jin Zhang,et al.  Application of novel chaotic neural network on Mandarin digital speech recognition , 2009 .