Experiments for isolated-word recognition with single- and two-layer perceptrons

Abstract Several design strategies for feed-forward networks are examined within the scope of pattern classification. Single- and two-layer perceptron models are adapted for experiments in isolated-word recognition. Direct (one-step) classification as well as several hierarchical (two-step) schemes have been considered. For a vocabulary of 20 English words spoken repeatedly by 11 speakers, the word classes are found to be separable by hyperplanes in the chosen feature space. Since for speaker-dependent word recognition the underlying data base contains only a small training set, an automatic expansion of the training material improves the generalization properties of the networks. This method accounts for a wide variety of observable temporal structures for each word and gives a better overall estimate of the network parameters which leads to a recognition rate of 99.5%. For speaker-independent word recognition, a hierarchical structure with pairwise training of two-class models is superior to a single uniform network (98% average recognition rate).

[1]  S. N. Srihari,et al.  Neural network models and their application to handwritten digit recognition , 1988, IEEE 1988 International Conference on Neural Networks.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  J. Tanahashi,et al.  Large-vocabulary spoken word recognition using simplified time-warping patterns , 1982, ICASSP.

[4]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[5]  Alex Waibel,et al.  Phoneme Recognition: Neural Networks vs , 1988 .

[6]  Anthony Bladon,et al.  Acoustic phonetics, auditory phonetics, speaker sex and speech recognition: a thread , 1986 .

[7]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[8]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[9]  Abdulmesih Aktas,et al.  Large-vocabulary isolated word recognition with fast coarse time alignment , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  D. Lubensky,et al.  Learning spectral-temporal dependencies using connectionist networks , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[15]  Richard Lippmann,et al.  Neural Net and Traditional Classifiers , 1987, NIPS.

[16]  Bernhard R. Kämmerer,et al.  Special feature vector coding and appropriate distance definition developed for a speech recognition system , 1984, ICASSP.

[17]  Richard P. Lippmann,et al.  A neural net approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[18]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.