Tone articulation modeling for Mandarin spontaneous speech recognition

Tone modeling is an unavoidable problem in Mandarin speech recognition. In continuous speech, the pitch contour exhibits variable patterns, and it is strongly influenced by its tone context. Although several effective methods have been proposed to improve the accuracy for tonal syllables in Mandarin continuous speech recognition, many recognition errors are caused by poor tone discrimination capability of the acoustic model. Furthermore, the case becomes worse for the recognition of spontaneous speech. In this paper, we report our work on tone articulation modeling. Tone context dependent models are used to model unstable pitch patterns caused by co-articulation in continuous speech. Corresponding acoustic features are investigated as well. Our methods are evaluated on two test sets: one is reading-style speech data, the other is spontaneous. The experimental results show that for the test set of casual speech, the proposed method turns out to be more effective than tone context independent model, while they are comparable for the test set of reading-style speech. Several factors which have potential to improve the proposed method are discussed in the final part in this paper.

[1]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[2]  Haiping Li,et al.  Recognize tone languages using pitch information on the main vowel of each syllable , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[4]  Detlev Langmann,et al.  A comparative study of linear feature transformation techniques for automatic speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Keikichi Hirose,et al.  Anchoring hypothesis and its application to tone recognition of Chinese continuous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[8]  Frank Seide,et al.  Two-stream modeling of Mandarin tones , 2000, INTERSPEECH.

[9]  Emily Q. Wang,et al.  Pitch targets and their realization: Evidence from Mandarin Chinese , 2001, Speech Commun..

[10]  Yu Shi,et al.  Segmental tonal modeling for phone set design in Mandarin LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.