PHONEME RECOGNITION USING ARTIFICIAL NEURAL NETWORKS

An artificial neural network has been trained to recognizes phonemes using the error back-propagation technique. First a coarse feature network was trained to extract seven quasi-phonetic features from the spectral frames of a Bark-scaled filter bank. The outputs of this net and the spectral outputs of the filter bank were input to a phoneme recognition net. A seven frame wide window of the feature net output was used to include the context of the frame being classified. Both Swedish and Hungarian speech material was used and the following results are for Hungarian. The coarse features were recognized with 80% - 93% accuracy and the performance was shown to be relatively insensitive to changing speaker or language. The frame level phone recognition rate was 55%. Using manual segmentation the phone recognition rate was 64% and in 82% of the cases, the correct phone was among the best three phoneme candidates.

[1]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Stephen E. Levinson,et al.  Speaker independent phonetic transcription of fluent speech for large vocabulary speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[5]  H. Hackbarth,et al.  Scaly artificial neural networks for speaker-independent recognition of isolated words , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Olli Ventä,et al.  Microprocessor implementation of a large vocabulary speech recognizer and phonetic typewriter for Finnish and Japanese , 1987, ECST.