Modularity and scaling in large phonemic neural networks

The authors train several small time-delay neural networks aimed at all phonemic subcategories (nasals, fricatives, etc.) and report excellent fine phonemic discrimination performance for all cases. Exploiting the hidden structure of these small phonemic subcategory networks, they propose several technique that make it possible to grow larger nets in an incremental and modular fashion without loss in recognition performance and without the need for excessive training time or additional data. The techniques include class discriminatory learning, connectionist glue, selective/partial learning, and all-net fine tuning. A set of experiments shows that stop consonant networks (BDGPTK) constructed from subcomponent BDG- and PTK-nets achieved up to 98.6% correct recognition compared to 98.3 and 98.7% correct for the BDG- and PTK-nets. Similarly, an incrementally trained network aimed at all consonants achieved recognition scores of about 96% correct. These results are comparable to the performance of the subcomponent networks and significantly better than that of several alternative speech recognition methods. >

[1]  S. Blumstein,et al.  Perceptual invariance and onset spectra for stop consonants in different vowel environments , 1976 .

[2]  S. Blumstein,et al.  Acoustic invariance in speech production: evidence from measurements of the spectral characteristics of stop consonants. , 1979, The Journal of the Acoustical Society of America.

[3]  D Kewley-Port,et al.  Time-varying features as correlates of place of articulation in stop consonants. , 1983, The Journal of the Acoustical Society of America.

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Shozo Makino,et al.  Recognition of phonemes using time-spectrum pattern , 1986, Speech Commun..

[6]  David J. Burr Speech Recognition Experiments with Perceptrons , 1987, NIPS.

[7]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[8]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Victor Zue,et al.  Applications of Error Back-Propagation to Phonetic Classification , 1988, NIPS.

[10]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[11]  Lokendra Shastri,et al.  Speech recognition using connectionist networks , 1988 .

[12]  Sharad Singhal,et al.  Using an adaptive network to recognize demisyllables in continuous speech , 1988 .

[13]  H. Sawai,et al.  Spotting Japanese CV-syllables and phonemes using the time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[16]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..