Investigation of using different Chinese word segmentation standards and algorithms for automatic speech recognition

Chinese word segmentation (CWS) is a necessary step in Mandarin Chinese automatic speech recognition (ASR), and it has an impact on the results of ASR. However, there are few works on the relations between CWS and ASR. CWS settings, including segmentation standards and algorithms, are involved in building a segmenter. In this paper, four CWS standards and three CWS algorithms, including maximum matching, term frequency based and conditional random field (CRF) based algorithms, are investigated for ASR performance. Our experiments on the second Sighan Bakeoff data and Mandarin Chinese conversational telephone speech show that a better segmentation performance does not necessarily lead to a better ASR performance. Maximum matching and the term frequency based algorithm, which are classified as lexicon-based algorithms, are more flexible to update their vocabulary inventories according to the application need. We find that these two algorithms can provide similar ASR performance as the CRF-based algorithm. Motivated by the availability of huge amounts of web text data, we investigate whether this can improve the term frequency based algorithm and thus the ASR performance. Lastly we find that combining the two lexicon-based algorithms through language model interpolation can further improve the ASR performance.

[1]  Cheung-Chi Leung,et al.  Integrating multiple observations for model-based single-microphone speech separation with conditional random fields , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Wen Wang,et al.  Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[4]  Jiulong Shan,et al.  Search by voice in Mandarin Chinese , 2010, INTERSPEECH.

[5]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[6]  João Paulo da Silva Neto,et al.  Combination of acoustic models in continuous speech recognition hybrid systems , 2000, INTERSPEECH.

[7]  Andrew McCallum,et al.  Chinese Segmentation and New Word Detection using Conditional Random Fields , 2004, COLING.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[10]  Eiichiro Sumita,et al.  Improved Statistical Machine Translation by Multiple Chinese Word Segmentation , 2008, WMT@ACL.

[11]  Brian Kingsbury,et al.  Constructing ensembles of ASR systems using randomized decision trees , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Biing-Hwang Juang,et al.  Adaptive boosted non-uniform mce for keyword spotting on spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[14]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[15]  Richard M. Schwartz,et al.  The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system , 2005, INTERSPEECH.

[16]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[17]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[19]  Cheung-Chi Leung,et al.  Comparing prosodic models for speaker recognition , 2008, INTERSPEECH.

[20]  Jean-Luc Gauvain,et al.  MODELING CHARACTERS VERSUS WORDS FOR MANDARIN SPEECH RECOGNITION , 2009 .

[21]  Bin Ma,et al.  Parallel Acoustic Model Adaptation for Improving Phonotactic Language Recognition , 2010, Odyssey.

[22]  Bin Ma,et al.  Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[24]  Xunying Liu,et al.  Syllable language models for Mandarin speech recognition: exploiting character language models. , 2013, The Journal of the Acoustical Society of America.

[25]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[26]  Keh-Jiann Chen,et al.  Word Identification for Mandarin Chinese Sentences , 1992, COLING.

[27]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.