Connectionist Models and their Application to Automatic Speech Recognition

Abstract The purpose of this chapter is to study the application of some connectionist models to automatic speech recognition. Ways to take advantage of a-priori knowledge in the design of those models are first considered. Then algorithms for some recurrent networks are described since they are well-suited to handling temporal dependences such as those found in speech. Some simple methods that accelerate the convergence of gradient descent with the back-propagation algorithm are discussed. An alternative approach to speed-up the networks are systems based on Radial Basis Functions (local representation). Detailed results of several experiments with these networks on the recognition of phonemes for the TIMIT database are presented. In conclusion, a cognitively relevant model is proposed. This model combines both a local representation and and a distributed representation subnetworks to which correspond respectively a fast-learning and a slow-learning capability.

[1]  Piero Cosi,et al.  On the Generalization Capability of Multi-Layered Networks in the Extraction of Speech Properties , 1989, IJCAI.

[2]  Yoshua Bengio,et al.  Data-Driven Execution of Multi-Layered Networks for Automatic Speech Recognition , 1988, AAAI.

[3]  Yoshua Bengio,et al.  Use of multilayer networks for the recognition of phonetic features and phonemes , 1989 .

[4]  Victor W. Zue,et al.  Phonetic classification using multi-layer perceptrons , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Piero Cosi,et al.  Use of Multi-Layered Networks for Coding Speech with Phonetic Features , 1988, NIPS.

[6]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[7]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[8]  Richard Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[9]  R. Penrose A Generalized inverse for matrices , 1955 .

[10]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[11]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[12]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[13]  Yoshua Bengio,et al.  Use of neural networks for the recognition of place of articulation , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  Mark Derthick,et al.  Variations on the Boltzmann Machine Learning Algorithm , 1984 .

[15]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[16]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[19]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[20]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[21]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[22]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[23]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[24]  Stephanie Seneff,et al.  Pitch and spectral analysis of speech based on an auditory synchrony model , 1985 .

[25]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26]  Yoshua Bengio,et al.  Programmable execution of multi-layered networks for automatic speech recognition , 1989, CACM.

[27]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[28]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[29]  Françoise Fogelman-Soulié,et al.  Speaker-independent isolated digit recognition: Multilayer perceptrons vs. Dynamic time warping , 1990, Neural Networks.

[30]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[31]  M. Gori,et al.  BPS: a learning algorithm for capturing the dynamic nature of speech , 1989, International 1989 Joint Conference on Neural Networks.

[32]  Yoshua Bengio,et al.  Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge , 1989, NIPS.

[33]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[34]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[35]  Raymond L. Watrous Context‐modulated discrimination of similar vowels using second‐order connectionist networks , 1989 .