A connectionist model for phoneme recognition in continuous speech

A connectionist structure for phoneme recognition in continuous speech is described. This has two main parts. The first is a sound subunit classifier in the form of a three-layer back propagation network which classifies speech subunits from frames of spectral speech data. This is followed by a sequence classifier in the form of a network of neural like-units which classifies phonemes from input sequences of subunits by their occurrence and duration. Results are given for a 15-phoneme subset of British English, for a single speaker. These include the difficult syllable initial and final stop consonants, fricatives, vowels, and diphthongs. The overall recognition accuracy achieved is 87%.<<ETX>>

[1]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[2]  J J Hopfield,et al.  Neural computation by concentrating information in time. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.