Tone nucleus modeling for Chinese lexical tone recognition

This paper presents a new scheme to deal with variations in fundamental frequency (F0) contours for lexical tone recognition in continuous Chinese speech. We divide F0 contour of a syllable into tone nucleus and adjacent articulatory transitions. We only use acoustic features of the tone nucleus for tone recognition. Tone nucleus of a syllable is assumed to be the target F0 of the associated lexical tone, and usually conforms more likely to the standard tone pattern than the articulatory transitions. A tone nucleus can be detected from a syllable F0 contour by a two-step algorithm. First, the syllable F0 contour is segmented into several linear F0 loci that serve as candidates for the tone-nucleus using segmental K-means segmentation algorithm. Then, tone nucleus is chosen from a set of candidates by a predictor based on linear discriminant analysis. Speaker dependent tone recognition experiments using tonal HMMs showed our new approach achieved an improvement of up to 6% for tone recognition rate compared with a conventional one. This indicates not only that tone-nucleus keeps important discriminant information for the lexical tones, but also that our tone-nucleus based tone recognition algorithm works properly.

[1]  Keikichi Hirose,et al.  Anchoring hypothesis and its application to tone recognition of Chinese continuous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Yi Xu,et al.  Effects of tone and focus on the formation and alignment of f0contours , 1999 .

[3]  Jian Liu,et al.  Study on tone classification of Chinese continuous speech in speech recognition system , 1999, EUROSPEECH.

[4]  Keikichi Hirose,et al.  Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[5]  Keikichi Hirose,et al.  HMM-based tone recognition of Chinese trisyllables using double codebooks on fundamental frequency and waveform power , 1995, EUROSPEECH.

[6]  David Bradley,et al.  Prosodic analysis and asian linguistics : to honour R. K. Sprigg , 1989 .

[7]  Stephanie Seneff,et al.  A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition , 1998, ICSLP.

[8]  George R. Doddington,et al.  An integrated pitch tracking algorithm for speech systems , 1983, ICASSP.

[9]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[10]  Yoshinori Sagisaka,et al.  Computing Prosody, Computational Models for Processing Spontaneous Speech , 2011 .

[11]  Gökhan Tür,et al.  Modeling the prosody of hidden events for improved word recognition , 1999, EUROSPEECH.

[12]  D H Whalen,et al.  Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[13]  Hiroya Fujisaki,et al.  Prosody, Models, and Spontaneous Speech , 1997, Computing Prosody.

[14]  Yi Xu Contextual tonal variations in Mandarin , 1997 .

[15]  Haiping Li,et al.  Recognize tone languages using pitch information on the main vowel of each syllable , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Y Xu,et al.  Production and perception of coarticulated tones. , 1994, The Journal of the Acoustical Society of America.

[17]  Jia-Lin Shen,et al.  Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data , 1997, IEEE Trans. Speech Audio Process..

[18]  C. Shih,et al.  Mandarin third tone sandhi and prosodic structure , 1997 .

[19]  Andrej Ljolje,et al.  Recognition of isolated prosodic patterns using Hidden Markov Models , 1987 .

[20]  Chiu-yu Tseng,et al.  Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary , 1993, IEEE Trans. Speech Audio Process..

[21]  Y R Wang,et al.  Tone recognition of continuous Mandarin speech assisted with prosodic information. , 1994, The Journal of the Acoustical Society of America.

[22]  Hiromichi Kawanami,et al.  Modeling carryover and anticipation effects for Chinese tone recognition , 1999, EUROSPEECH.

[23]  Lin Maocan A perceptual study on the domain of tones in Beijing Mandarin , 1995 .

[24]  Yi Xu,et al.  What can tone studies tell us about intonation , 1997 .

[25]  Joseph A. Wolkan,et al.  Introduction to probability and statistics , 1994 .

[26]  Keikichi Hirose,et al.  A robust tone recognition method of Chinese based on sub-syllabic F0 contours , 1998, ICSLP.

[27]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[28]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[29]  Norval Smith,et al.  Studies in Chinese phonology , 1997 .

[30]  Keikichi Hirose,et al.  Chinese four tone recognition based on the model for process of generating F0 contours of sentences , 1990, ICSLP.

[31]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[32]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[33]  Y Xu,et al.  Consistency of Tone-Syllable Alignment across Different Syllable Structures and Speaking Rates , 1998, Phonetica.

[34]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[35]  Keikichi Hirose,et al.  Recognizing Accent types and Detecting Prosodic Word Boundaries Using Statistical Models of Moraic Transition , 1998 .

[36]  J. Howie,et al.  On the Domain of Tone in Mandarin , 1974 .

[37]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[38]  Zhang Jialu Tempo effects in Chinese prosodic patterns , 1998 .