Integrated phoneme and function word architecture of hidden control neural networks for continuous speech recognition

We present a context-dependent, phoneme and function word based, Hidden Control Neural Network (HCNN- CDF) architecture for continuous speech recognition. The system can be seen as a large vocabulary extension of the word- based HCNN system proposed by Levin in 1990. Initially, we analysed context-/ndependent HCNN modeling principle in the framework of the Linked Predictive Neural Network (LPNN) speech recognition system and found that it results in a 6% increase of the word recognition accuracy at perplexity 402. Significant savings compared to the LPNN in the resource requirements and computational load for the HCNN implementation can be achieved. In speaker-dependent recognition experiments with perplexity 111, the current versions of the LPNN and HCNN-CDF systems achieve 60% and 75% word recognition accuracies, respectively. Zusammenfasstmg. Wir stellen im folgenden eine kontextabh/ingige auf Phonemen und Funktionsw6rtern basierende Hidden Control Neural Network Architektur (HCNN-CDF) fiir die Erkennung von kontinuierlicher Sprache vor. Das System ist eine Erweiterung des wortbasierten HCNN Systems von Levin in 1990 auf ein groges Vokabular. Wir haben zuerst das Prinzip der kontextunabh/ingigen HCNN-Mode!lierung im Rahmen des Linked Predictive Neural Network (LPNN) Spracherkennungs- systems untersucht und eine Verbesserung der Worterkennungsrate um 6% bei einer Perplexit/it von 402 festgestellt. Fiir die HCNN-Implementation konnte eine bedeutende Parameterreduktion und Einsparung von Rechenzeit gegeniiber LPNN erreicht werden. Bei sprecherabh/ingigen Erkennungsexperimenten mit der Perplexit/it 111 erreichten die aktuellen Versionen des LPNN und des HCNN-CDF Systems Worterkennungsraten yon 60% bzw. 75%.

[1]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[2]  Neural Predi LARGE VOCABULARY SPEECH RECOGNITION USING NEURAL PREDICTION MODEL , 1991 .

[3]  Mahesan Niranjan,et al.  Neural networks and radial basis functions in classifying static speech patterns , 1990 .

[4]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hervé Bourlard,et al.  Neural nets and hidden Markov models: Review and generalizations , 1991, Speech Commun..

[6]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[8]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Naftali Tishby,et al.  A dynamical systems approach to speech processing , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hervé Bourlard,et al.  Phonetic context in hybrid HMM/MLP continuous speech recognition , 1991, EUROSPEECH.

[11]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[12]  Raymond L. Watrous Context‐modulated discrimination of similar vowels using second‐order connectionist networks , 1989 .

[13]  Alexander H. Waibel,et al.  Integrated phoneme-function word architecture of hidden control neural networks for continuous speech recognition , 1991, EUROSPEECH.

[14]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Alexander H. Waibel,et al.  Recent work in continuous speech recognition using the connectionist viterbi training procedure , 1991, EUROSPEECH.

[16]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[18]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[19]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[20]  Ken-ichi Iso,et al.  Large vocabulary speech recognition using neural prediction model , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.