Continuous speech recognition using linked predictive neural networks

The authors present a large vocabulary, continuous speech recognition system based on linked predictive neural networks (LPNNs). The system uses neural networks as predictors of speech frames, yielding distortion measures which can be used by the one-stage DTW algorithm to perform continuous speech recognition. The system currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 7, 111, and 402, respectively, outperforming several simple HMMs that have been tested. It was also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. The strengths and weaknesses of the predictive approach are discussed.<<ETX>>

[1]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[2]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Naftali Tishby,et al.  A dynamical systems approach to speech processing , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hermann Ney,et al.  Phoneme modelling using continuous mixture densities , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Kiyohiro Shikano,et al.  Integrated training for spotting Japanese phonemes using large phonemic time-delay neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .