Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR

Tone-enhanced, generalized character posterior probability (GCPP), a generalized form of posterior probability at subword (Chinese character) level, is proposed as a rescoring metric for improving Cantonese LVCSR performance. The search network is constructed first by converting the original word graph to a restructured word graph, then a character graph and finally, a character confusion network (CCN). Based upon GCPP enhanced with tone information, the character error rate (CER) is minimized or the GCPP product is maximized over a chosen graph. Experimental results show that the tone enhanced GCPP can improve character error rate by up to 15.1%, relatively

[1]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[2]  Satoshi Nakamura,et al.  Robust verification of recognized words in noise , 2004, INTERSPEECH.

[3]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Tan Lee,et al.  Lexical tree decoding with a class-based language model for Chinese speech recognition , 2000, Interspeech.

[5]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[6]  F. K. Soong Generalized word posterior probability (GWPP) for measuring reliability of recognized words , 2004 .

[7]  Satoshi Nakamura,et al.  Optimal acoustic and language model weights for minimizing word verification errors , 2004, INTERSPEECH.

[8]  Chalapathy Neti,et al.  Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[10]  Yao Qian,et al.  Use of tone information in cantonese lvcsr based on generalized character posterior probability decoding , 2005 .

[11]  Peter Regel-Brietzmann,et al.  Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[13]  Yujia Li,et al.  Overlapped di-tone modeling for tone recognition in continuous Cantonese speech , 2003, INTERSPEECH.

[14]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[15]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[16]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .