Data-Driven Execution of Multi-Layered Networks for Automatic Speech Recognition

A set of Multi-Layered Networks (MLN) for Automatic Speech Recognition (ASR) is proposed. Such a set allows the integration of information extracted with variable resolution in the time and frequency domains and to keep the number of links between nodes of the networks small in order to allow significant generalization during learning with a reasonable training set size. Subsets of networks can be executed depending on preconditions based on descriptions of the time evolution of signal energies allowing spectral properties that are significant in different acoustic situations to be learned. Preliminary experiments on speaker-independent recognition of the letters of the E-set are reported. Voices from 70 speakers were used for learning. Voices of 10 new speakers were used for test. An overall error rate of 9.5% was obtained in the test showing that results better than those previously reported can be achieved.

[1]  Lokendra Shastri,et al.  Learning Phonetic Features Using Connectionist Networks , 1987, IJCAI.

[2]  Renato De Mori,et al.  Learning and Plan Refinement in a Knowledge-Based System for Automatic Speech Recognition , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Pietro Laface,et al.  Parallel Algorithms for Syllable Recognition in Continuous Speech , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Renato De Mori,et al.  A continuous parameter and frequency domain based Markov model , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[7]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jean Rouat,et al.  Use of Procedural Knowledge for Automatic Speech Recognition , 1987, IJCAI.