Speech recognition using temporal decomposition and multi-layer feed-forward automata

A report is presented of intraspeaker and interspeaker variability as a major source of error in automatic speech recognition. The authors report on two series of experiments using multilayer feed-forward automata (MLFFA) to control some aspects of this variability. The first series concerns the classification of spectral targets obtained from a robust implementation of temporal decomposition. An MLFFA accepts three successive targets to output an allophonic label. No improvement has been found so far from traditional classification techniques (i.e. k-nearest neighbors). In a second series of experiments spectral transformations using MLFFA are introduced for the adaptation to new speakers. Compared to linear techniques (multivariate regression and canonical correlation analysis), the MLFFA approach offers some improvement.<<ETX>>

[1]  G. Chollet,et al.  Adaptation of automatic speech recognizers to new speakers using canonical correlation analysis techniques , 1986 .

[2]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[3]  Alex Waibel,et al.  Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  C. Montacie,et al.  Temporal decomposition and acoustic-phonetic decoding of speech , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.