Discriminative Lexicon Adaptation for Improved Character Accuracy - A New Direction in Chinese Language Modeling

While OOV is always a problem for most languages in ASR, in the Chinese case the problem can be avoided by utilizing character n-grams and moderate performances can be obtained. However, character n-gram has its own limitation and proper addition of new words can increase the ASR performance. Here we propose a discriminative lexicon adaptation approach for improved character accuracy, which not only adds new words but also deletes some words from the current lexicon. Different from other lexicon adaptation approaches, we consider the acoustic features and make our lexicon adaptation criterion consistent with that in the decoding process. The proposed approach not only improves the ASR character accuracy but also significantly enhances the performance of a character-based spoken document retrieval system.

[1]  Lin-Shan Lee,et al.  Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Pascale Fung Extracting Key Terms from Chinese and Japanese texts , 1998 .

[3]  Keh-Jiann Chen,et al.  Unknown Word Extraction for Chinese Documents , 2002, COLING.

[4]  Marcello Federico,et al.  Broadcast news LM adaptation over time , 2004, Comput. Speech Lang..

[5]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[6]  Yoshinori Sagisaka,et al.  Statistical language modeling with a class-basedn-multigram model , 2000, Comput. Speech Lang..

[7]  Berlin Chen,et al.  Lightly supervised and data-driven approaches to Mandarin broadcast news transcription , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[9]  Jianfeng Gao,et al.  Toward a unified approach to statistical language modeling for Chinese , 2002, TALIP.

[10]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[11]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[12]  Wen Wang,et al.  Investigation on Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[13]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[14]  Lin-Shan Lee,et al.  Statistics-based segment pattern lexicon-a new direction for Chinese language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Chorkin Chan,et al.  Chinese Word Segmentation based on Maximum Matching and Word Binding Force , 1996, COLING.

[16]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.

[17]  Frank K. Soong,et al.  Tone-Enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  Hermann Ney,et al.  Open vocabulary speech recognition with flat hybrid models , 2005, INTERSPEECH.

[19]  Lin-Shan Lee,et al.  Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks , 2006, ISCSLP.

[20]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[21]  George Saon,et al.  Data-driven approach to designing compound words for continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[22]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[23]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[24]  Frank K. Soong,et al.  Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR , 2008, Comput. Speech Lang..

[25]  Changning Huang,et al.  Chinese Word Segmentation: A Pragmatic Approach , 2004 .