The effect of fundamental frequency on Mandarin speech recognition

We study the effects of modeling tone in Mandarin speech recognition. Including the neutral tone, there are 5 tones in Mandarin and these tones are syllable-level phenomena. A direct acoustic manifestation of tone is the fundamental frequency (f0). We will report on the effect of f0 on the acoustic recognition accuracy of a Mandarin recognizer. In particular, we put f0, its first derivative (f0′), and its second derivative (f0′′) in separate streams of the feature vector. Stream weights are adjusted to investigate the individual effects of f0, f0′, and f0′′ to recognition accuracy. Our results show that incorporating the f0 feature negatively impacted accuracy, whereas f0’ increased accuracy and f0’’ seemed to have no effect.

[1]  Yi Xu Contextual tonal variations in Mandarin , 1997 .

[2]  Chiu-yu Tseng,et al.  Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Michael Picheny,et al.  Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Steve Young,et al.  The HTK book , 1995 .