Speaker-independent phonetic classification in continuous English letters

A phonetic front-end for speaker-independent recognition of continuous letter strings is described. A feedforward neutral network is trained to classify 3 msec speech frames as one of the 30 phonemes in the English alphabet. Phonetic context is used in two ways: first, by providing spectral and waveform information before and after the frame to be classified, and second, by a second-pass network that uses both acoustic features and the phonetic outputs of the first-pass network. This use of context reduced the error rate by 50%. The effectiveness of the DFT and the more compact PLP (perceptual linear predictive) analysis is compared, and several other features, such as zero crossing rate, are investigated. A frame-based phonetic classification performance of 75.7% was achieved.<<ETX>>

[1]  Ronald A. Cole,et al.  Speaker-independent English alphabet recognition: experiments with the e-set , 1990, ICSLP.

[2]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[3]  John Makhoul,et al.  Spectral linear prediction: Properties and applications , 1975 .

[4]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[5]  Ronald A. Cole,et al.  Speaker-independent recognition of spoken English letters , 1990, 1990 IJCNN International Joint Conference on Neural Networks.