Tone recognition with fractionized models and outlined features

Different feature extraction and tone modeling schemes are investigated on both speaker-dependent and speaker-independent continuous speech databases. Tone recognition features can be classified as detailed features which use the entire F0 curve, and outlined features which capture the main structure of the F0 curve. Tone models of different size, ranging from very simple one-tone-one-model tone models to complex phoneme-dependent tone models, have different abilities to characterize tone. Our experiments show two conclusions. First, the detailed information of the F0 curve is not necessary for tone recognition. The outlined features can, not only reduce the number of parameters, but also improve the accuracy of tone recognition. The proposed subsection average F0 and /spl Delta/F0 are shown to be effective outlined features. The second conclusion is that the one-tone-one-model scheme is not sufficient. Building phoneme-dependent tone models can highly improve the recognition accuracy, especially for speaker-independent data. Thus we suggest using fractionized models, trained with the outlined features, for tone recognition.

[1]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[2]  Lin-Shan Lee,et al.  Voice dictation of Mandarin Chinese , 1997, IEEE Signal Process. Mag..

[3]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[4]  Shusheng Gu,et al.  Mandarin four-tone recognition with the fuzzy C-means algorithm , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[5]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[6]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..