Isolated word recognition using modular recurrent neural networks

Abstract This paper describes a novel method of using recurrent neural networks (RNN) for isolated word recognition. Each word in the target vocabulary is modeled by a fully connected recurrent network. To recognize an input utterance, the best matching word is determined based on its temporal output response. The system is trained in two stages. First, the RNN speech models (RSM) are trained independently to capture the essential static and temporal characteristics of individual words. This is performed by using an iterative re-segmentation training algorithm which gives the optimal phonetic segmentation automatically for each training utterance. The second-stage involves mutually discriminative training among the RSMs, aiming at minimizing the probability of misclassification. A series of simulation experiments have been performed to demonstrate the effectiveness of the proposed recognition method. For the recognition of (A) 20 English words, (B) 11 Cantonese digits and (C) 58 Cantonese CV syllables, the top-1 accuracy are 91.9, 93.6 and 87.1%, respectively.

[1]  Lai-Wan Chan,et al.  An RNN based speech recognition system with discriminative training , 1995, EUROSPEECH.

[2]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Yuan-Fu Liao,et al.  Speech recognition with hierarchical recurrent neural networks , 1995, Pattern Recognit..

[5]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[6]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[7]  John S. Bridle,et al.  Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..

[8]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[9]  Hervé Bourlard,et al.  Speech pattern discrimination and multilayer perceptrons , 1989 .

[10]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[11]  Hervé Bourlard,et al.  Neural networks for statistical recognition of continuous speech , 1995, Proc. IEEE.

[12]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Tan Lee,et al.  Automatic recognition of isolated Cantonese syllables using neural networks =: 利用神經網絡識別粤語單音節 , 1996 .

[14]  Sun-Yuan Kung,et al.  Digital neural networks , 1993, Prentice Hall Information and System Sciences Series.

[15]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..

[16]  J. S. Bridle,et al.  An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[18]  Herbert Reininger,et al.  A fully recurrent neural network for recognition of noisy telephone speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Hervé Bourlard,et al.  Speech dynamics and recurrent neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  Jay G. Wilpon,et al.  Discriminative feature selection for speech recognition , 1993, Comput. Speech Lang..

[21]  Anthony J. Robinson,et al.  Real-time recognition of broadcast radio speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Lai-Wan Chan,et al.  Recurrent neural networks for speech modeling and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Fionn Murtagh,et al.  Dynamical recurrent neural networks -- towards environmental time series prediction , 1995, Int. J. Neural Syst..

[24]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.