Global optimization of a neural network-hidden Markov model hybrid

An original method for integrating artificial neural networks (ANN) with hidden Markov models (HMM) is proposed. ANNs are suitable for performing phonetic classification, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported.<<ETX>>

[1]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Yoshua Bengio,et al.  Programmable execution of multi-layered networks for automatic speech recognition , 1989, CACM.

[6]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[7]  G. Stewart Introduction to matrix computations , 1973 .

[8]  Renato De Mori,et al.  A hybrid coder for hidden Markov models using a recurrent neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9]  K. Stevens The Potential Role of Property Detectors in the Perception of Consonants , 1975 .

[10]  Stephen E. Levinson,et al.  A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building , 1985, IEEE Trans. Acoust. Speech Signal Process..

[11]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[12]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[13]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[14]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.