Tone recognition in continuous Cantonese speech using supratone models.

This paper studies automatic tone recognition in continuous Cantonese speech. Cantonese is a major Chinese dialect that is known for being rich in tones. Tone information serves as a useful knowledge source for automatic speech recognition of Cantonese. Cantonese tone recognition is difficult because the tones have similar shapes of pitch contours. The tones are differentiated mainly by their relative pitch heights. In natural speech, the pitch level of a tone may shift up and down and the F0 ranges of different tones overlap with each other, making them acoustically indistinguishable within the domain of a syllable. Our study shows that the relative pitch heights are largely preserved between neighboring tones. A novel method of supratone modeling is proposed for Cantonese tone recognition. Each supratone model characterizes the F0 contour of two or three tones in succession. The tone sequence of a continuous utterance is formed as an overlapped concatenation of supratone units. The most likely tone sequence is determined under phonological constraints on syllable-tone combinations. The proposed method attains an accuracy of 74.68% in speaker-independent tone recognition experiments. In particular, the confusion among the tones with similar contour shapes is greatly resolved.

[1]  S. Chandrasekaran,et al.  Parameter estimation in the presence of bounded modeling errors , 1997, IEEE Signal Processing Letters.

[2]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[3]  P. C. Ching,et al.  From phonology and acoustic properties to automatic recognition of Cantonese , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[4]  Gang Peng,et al.  Tone recognition of continuous Cantonese speech based on support vector machines , 2005, Speech Commun..

[5]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[6]  A. Cutler,et al.  Lexical tone in Cantonese spoken-word processing , 1997, Perception & psychophysics.

[7]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[8]  Hsin-Min Wang,et al.  Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units , 1996, Speech Commun..

[9]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[10]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[11]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[12]  D H Whalen,et al.  Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[13]  Yujia Li,et al.  Analysis and modeling of F0 contours for cantonese text-to-speech , 2004, TALIP.

[14]  Yi Xu Contextual tonal variations in Mandarin , 1997 .

[15]  R. Diehl,et al.  Effects of syllable duration on the perception of the Mandarin Tone 2/Tone 3 distinction: evidence of auditory enhancement , 1990 .

[16]  Keikichi Hirose,et al.  Tone nucleus-based multi-level robust acoustic tonal modeling of sentential F0 variations for Chinese continuous speech tone recognition , 2005, Speech Commun..