A phonetic front-end for speaker-independent recognition of continuous letter strings is described. A feedforward neutral network is trained to classify 3 msec speech frames as one of the 30 phonemes in the English alphabet. Phonetic context is used in two ways: first, by providing spectral and waveform information before and after the frame to be classified, and second, by a second-pass network that uses both acoustic features and the phonetic outputs of the first-pass network. This use of context reduced the error rate by 50%. The effectiveness of the DFT and the more compact PLP (perceptual linear predictive) analysis is compared, and several other features, such as zero crossing rate, are investigated. A frame-based phonetic classification performance of 75.7% was achieved.<<ETX>>
[1]
Ronald A. Cole,et al.
Speaker-independent English alphabet recognition: experiments with the e-set
,
1990,
ICSLP.
[2]
H Hermansky,et al.
Perceptual linear predictive (PLP) analysis of speech.
,
1990,
The Journal of the Acoustical Society of America.
[3]
John Makhoul,et al.
Spectral linear prediction: Properties and applications
,
1975
.
[4]
Ronald A. Cole,et al.
Spoken Letter Recognition
,
1990,
HLT.
[5]
Ronald A. Cole,et al.
Speaker-independent recognition of spoken English letters
,
1990,
1990 IJCNN International Joint Conference on Neural Networks.