Overlapped di-tone modeling for tone recognition in continuous Cantonese speech

This paper presents a novel approach to tone recognition in continuous Cantonese speech based on overlapped di-tone Gaussian mixture models (ODGMM). The ODGMM is designed with special consideration on the fact that Cantonese tone identification relies more on the relative pitch level than on the pitch contour. A di-tone unit covers a group of two consecutive tone occurrences. The tone sequence carried by a Cantonese utterance can be considered as the connection of such di-tone units. Adjacent di-tone units overlap with each other by exactly one tone. For each di-tone unit, a GMM is trained with a 10-dimensional feature vector that characterizes the F0 movement within the unit. In particular, the di-tone models capture the relative deviation between the F0 levels of the two tones. Viterbi decoding algorithm is adopted to search for the optimal tone sequence, under the phonological constraints on syllable-tone combination. Experimental results show the ODGMM approach significantly outperforms the previously proposed methods for tone recognition in continuous Cantonese speech.

[1]  Tan Lee,et al.  0 ANALYSIS OF CONTINUOUS CANTONESE SPEECH , 2002 .

[2]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[3]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[5]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.