Using tone information in Cantonese continuous speech recognition

In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of Chinese. In particular, we focus on the Cantonese dialect, which is spoken by tens of millions of people in Southern China and Hong Kong. Cantonese is well known for its complicated tone system, which makes automatic tone recognition very difficult. This article describes an effective approach to explicit tone recognition of Cantonese in continuously spoken utterances. Tone feature vectors are derived, on a short-time basis, to characterize the syllable-wide patterns of F0 (fundamental frequency) and energy movements. A moving-window normalization technique is proposed to reduce the tone-irrelevant fluctuation of F0 and energy features. Hidden Markov models are employed for context-dependent acoustic modeling of different tones. A tone recognition accuracy of 66.4% has been achieved in the speaker-independent case. The recognized tone patterns are then utilized to assist Cantonese large-vocabulary continuous speech recognition (LVCSR) via a lattice expansion approach. Experimental results show that reliable tone information helps to improve the overall performance of LVCSR.

[1]  Keikichi Hirose,et al.  Anchoring hypothesis and its application to tone recognition of Chinese continuous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Chao Huang,et al.  Large vocabulary Mandarin speech recognition with different approaches in modeling tones , 2000, INTERSPEECH.

[3]  Mohammed Ismail,et al.  A High-Speed Low-Power Divide-by-15/16 Dual Modulus Prescaler in 0.6 μm CMOS , 2001 .

[4]  Jia-Lin Shen,et al.  Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data , 1997, IEEE Trans. Speech Audio Process..

[5]  Tan Lee,et al.  Incorporating tone information into Cantonese large-vocabulary continuous speech recognition , 2000, INTERSPEECH.

[6]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[7]  Sin-Horng Chen,et al.  Tone Recognition of Continuous Mandarin Speech Based on Hidden Markov Model , 1994, Int. J. Pattern Recognit. Artif. Intell..

[8]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[9]  Bo Xu,et al.  Decision tree based Mandarin tone model and its application to speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Stephanie Seneff,et al.  Improved tone recognition by normalizing for coarticulation and intonation effects , 2000, INTERSPEECH.

[11]  Stephanie Seneff,et al.  A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition , 1998, ICSLP.

[12]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[14]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[15]  Tan Lee,et al.  DEVELOPMENT OF CANTONESE SPOKEN LANGUAGE CORPORA FOR SPEECH APPLICATIONS , 1998 .

[16]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  Hsin-Min Wang,et al.  Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units , 1996, Speech Commun..

[18]  Bo Xu,et al.  Acoustic modeling for Chinese speech recognition: a comparative study of Mandarin and Cantonese , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[20]  Frank Seide,et al.  Two-stream modeling of Mandarin tones , 2000, INTERSPEECH.

[21]  Tan Lee,et al.  Acoustic modeling and language modeling for cantonese LVCSR , 1999, EUROSPEECH.

[22]  Hsiao-Chuan Wang,et al.  Hidden Markov model for Mandarin lexical tone recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[23]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[24]  Michael Picheny,et al.  Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[25]  Jian Liu,et al.  New tone recognition methods for Chinese continuous speech , 2000, INTERSPEECH.

[26]  Jean-Marie Humbert,et al.  Consonant Types, Vowel Quality, and Tone , 1978 .

[27]  Chiu-yu Tseng,et al.  Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[29]  Matthew Y. Chen,et al.  Tone Sandhi: Patterns across Chinese Dialects , 2000 .

[30]  Tan Lee,et al.  Cantonese syllable recognition using neural networks , 1999, IEEE Trans. Speech Audio Process..

[31]  Yuqing Gao,et al.  Tangerine: a large vocabulary Mandarin dictation system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.