Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR

Tone-enhanced generalized character posterior probability (GCPP), a generalized form of posterior probability at subword (Chinese character) level, is proposed as a rescoring metric for improving Cantonese LVCSR performance. GCPP is computed by tone score along with the corresponding acoustic and language model scores. The tone score is output from a supra-tone model, which characterizes not only the tone contour of a single syllable but also that of adjacent ones and significantly outperforms other conventional tone models. The search network is constructed first by converting the original word graph to a restructured word graph, then a character graph and finally, a character confusion network (CCN). Based upon tone-enhanced GCPP, the character error rate (CER) is minimized or the GCPP product is maximized over a chosen graph. Experimental results show that the tone-enhanced GCPP can improve character error rate by up to 15.1%, relatively.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Robert S. Bauer,et al.  Modern Cantonese Phonology , 1997 .

[5]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[6]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[7]  Frank K. Soong,et al.  Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Frank K. Soong,et al.  A multi-space distribution (MSD) approach to speech recognition of tonal languages , 2006, INTERSPEECH.

[9]  Gang Peng,et al.  Tone recognition of continuous Cantonese speech based on support vector machines , 2005, Speech Commun..

[10]  Keikichi Hirose,et al.  Anchoring hypothesis and its application to tone recognition of Chinese continuous speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Ye Tian,et al.  Tone recognition with fractionized models and outlined features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[13]  Tan Lee,et al.  Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[14]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[15]  F. K. Soong Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[16]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[17]  Keikichi Hirose,et al.  Tone recognition of Chinese continuous speech using tone critical segments , 1999, EUROSPEECH.

[18]  Bo Xu,et al.  Decision tree based Mandarin tone model and its application to speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Yi Xu Contextual tonal variations in Mandarin , 1997 .

[20]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[21]  Satoshi Nakamura,et al.  Optimal acoustic and language model weights for minimizing word verification errors , 2004, INTERSPEECH.

[22]  Satoshi Nakamura,et al.  Robust verification of recognized words in noise , 2004, INTERSPEECH.

[23]  Frank Seide,et al.  Two-stream modeling of Mandarin tones , 2000, INTERSPEECH.

[24]  Tan Lee,et al.  Acoustic modeling and language modeling for cantonese LVCSR , 1999, EUROSPEECH.

[25]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[26]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[27]  Hsin-Min Wang,et al.  Frameworks for recognition of Mandarin syllables with tones using sub-syllabic units , 1996, Speech Commun..

[28]  Chalapathy Neti,et al.  Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[30]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[31]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[32]  Frank K. Soong,et al.  Tone information as a confidence measure for improving Cantonese LVCSR , 2004, INTERSPEECH.

[33]  Yao Qian,et al.  Use of tone information in cantonese lvcsr based on generalized character posterior probability decoding , 2005 .

[34]  Yujia Li,et al.  Overlapped di-tone modeling for tone recognition in continuous Cantonese speech , 2003, INTERSPEECH.

[35]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[36]  Peter Regel-Brietzmann,et al.  Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..