A comparative study on hybrid acoustic phonetic decoders based on artificial neural networks

In this paper we compare two hybrid acoustic-phonetic decoders based on Artiicial Neural Networks (ANN). We evaluate them on the task of recognizing stop phones in continuous speech independently from the speaker. ANNs are well suited to perform detailed phonetic distinctions. In general, techniques based on Dynamic Programming (DP), in particular Hidden Markov Models (HMMs), have proven to be successful at modeling the temporal structure of the speech signal. In the approach described here, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters of the ANN/HMM decoder. Comparative experiments using this ANN/HMM hybrid decoder and another ANN-DP hybrid are reported for the TIMIT database.

[1]  Yoshua Bengio,et al.  Programmable execution of multi-layered networks for automatic speech recognition , 1989, CACM.

[2]  George Zavaliagkos,et al.  Continuous speech recognition using segmental neural nets , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[6]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[7]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[8]  Yoshua Bengio,et al.  Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge , 1989, NIPS.

[9]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Richard P. Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[11]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Frank Fallside,et al.  Phoneme Recognition from the TIMIT database using Recurrent Error Propa-gation Networks , 1990 .