Use of multilayer networks for the recognition of phonetic features and phonemes

Artificial neural networks capable of doing hard learning offer a new way to undertake automatic speech recognition. The Boltzmann machine algorithm and the error back‐propagation algorithm have been used to perform speaker normalization. Spectral segments are represented by spectral lines. Speaker‐independent recognition of place of articulation for vowels is performed on lines. Performance of the networks is shown to depend on the coding of the input data. Samples were extracted from continuous speech of 38 speakers. The error rate obtained (4.2% error on test set of 72 samples with the Boltzmann machine and 6.9% error with error back‐propagation) is better than that of previous experiments, using the same data, with continuous Hidden Markov Models (7.3% error on test set and 3% error on training set). These experiments are part of an attempt to construct a data‐driven speech recognition system with multiple neural networks specialized to different tasks. Results are also reported on the recognition performance of other trained networks, such as one trained on the E‐set consonants.

[1]  Piero Cosi,et al.  On the Generalization Capability of Multi-Layered Networks in the Extraction of Speech Properties , 1989, IJCAI.

[2]  Yoshua Bengio,et al.  Data-Driven Execution of Multi-Layered Networks for Automatic Speech Recognition , 1988, AAAI.

[3]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[4]  Renato De Mori,et al.  Computer Models of Speech Using Fuzzy Algorithms , 1983, Advanced Applications in Pattern Recognition.

[5]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. D. Harrison,et al.  Boltzmann machines for speech recognition , 1986 .

[7]  Renato De Mori,et al.  Learning and Plan Refinement in a Knowledge-Based System for Automatic Speech Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Renato De Mori,et al.  On the Use of a Taxonomy of Time-Frequency Morphologies for Automatic Speech Recognition , 1985, IJCAI.

[10]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Stephen E. Levinson,et al.  Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[12]  Pietro Laface,et al.  Parallel Algorithms for Syllable Recognition in Continuous Speech , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Yoshua Bengio,et al.  Programmable execution of multi-layered networks for automatic speech recognition , 1989, CACM.

[15]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[16]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[17]  Jean Rouat,et al.  Use of Procedural Knowledge for Automatic Speech Recognition , 1987, IJCAI.

[18]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Renato De Mori,et al.  A continuous parameter and frequency domain based Markov model , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[21]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[22]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985, Proceedings of the IEEE.

[23]  Lalit R. Bahl,et al.  A new algorithm for the estimation of hidden Markov model parameters , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.