论文信息 - Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Tone-enhanced generalized character posterior probability (GCPP), a generalized form of posterior probability at subword (Chinese character) level, is proposed as a rescoring metric for improving Cantonese LVCSR performance. GCPP is computed by tone score along with the corresponding acoustic and language model scores. The tone score is output from a supra-tone model, which characterizes not only the tone contour of a single syllable but also that of adjacent ones and significantly outperforms other conventional tone models. The search network is constructed first by converting the original word graph to a restructured word graph, then a character graph and finally, a character confusion network (CCN). Based upon tone-enhanced GCPP, the character error rate (CER) is minimized or the GCPP product is maximized over a chosen graph. Experimental results show that the tone-enhanced GCPP can improve character error rate by up to 15.1%, relatively.

Frank K. Soong | Tan Lee | Yao Qian

[1] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..

[2] Mitchel Weintraub,et al. LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] Frank Seide,et al. Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4] Robert S. Bauer,et al. Modern Cantonese Phonology , 1997 .

[5] Michael Picheny,et al. New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[6] Mitch Weintraub,et al. Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[7] Frank K. Soong,et al. Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8] Frank K. Soong,et al. A multi-space distribution (MSD) approach to speech recognition of tonal languages , 2006, INTERSPEECH.

[9] Gang Peng,et al. Tone recognition of continuous Cantonese speech based on support vector machines , 2005, Speech Commun..

[10] Keikichi Hirose,et al. Anchoring hypothesis and its application to tone recognition of Chinese continuous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11] Ye Tian,et al. Tone recognition with fractionized models and outlined features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Gunnar Evermann,et al. Posterior probability decoding, confidence estimation and system combination , 2000 .

[13] Tan Lee,et al. Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[14] Hermann Ney,et al. Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[15] F. K. Soong. Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[16] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[17] Keikichi Hirose,et al. Tone recognition of Chinese continuous speech using tone critical segments , 1999, EUROSPEECH.

[18] Bo Xu,et al. Decision tree based Mandarin tone model and its application to speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19] Yi Xu. Contextual tonal variations in Mandarin , 1997 .

[20] Pak-Chung Ching,et al. Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[21] Satoshi Nakamura,et al. Optimal acoustic and language model weights for minimizing word verification errors , 2004, INTERSPEECH.

[22] Satoshi Nakamura,et al. Robust verification of recognized words in noise , 2004, INTERSPEECH.

[23] Frank Seide,et al. Two-stream modeling of Mandarin tones , 2000, INTERSPEECH.

[24] Tan Lee,et al. Acoustic modeling and language modeling for cantonese LVCSR , 1999, EUROSPEECH.

[25] Kuldip K. Paliwal,et al. Speech Coding and Synthesis , 1995 .

[26] Hermann Ney,et al. Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[27] Hsin-Min Wang,et al. Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units , 1996, Speech Commun..

[28] Chalapathy Neti,et al. Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[30] Tan Lee,et al. Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[31] Sin-Horng Chen,et al. Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[32] Frank K. Soong,et al. Tone information as a confidence measure for improving Cantonese LVCSR , 2004, INTERSPEECH.

[33] Yao Qian,et al. Use of tone information in cantonese lvcsr based on generalized character posterior probability decoding , 2005 .

[34] Yujia Li,et al. Overlapped di-tone modeling for tone recognition in continuous Cantonese speech , 2003, INTERSPEECH.

[35] Hermann Ney,et al. A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[36] Peter Regel-Brietzmann,et al. Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37] Vaibhava Goel,et al. Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..