Combining recurrent neural networks and factored language models during decoding of code-Switching speech

In this paper, we present our latest investigations of language modeling for Code-Switching. Since there is only little text material for Code-Switching speech available, we integrate syntactic and semantic features into the language modeling process. In particular, we use part-of-speech tags, language identifiers, Brown word clusters and clusters of open class words. We develop factored language models and convert recurrent neural network language models into backoff language models for an efficient usage during decoding. A detailed error analysis reveals the strengths and weaknesses of the different language models. When we interpolate the models linearly, we reduce the perplexity by 15.6% relative on the SEAME evaluation set. This is even slightly better than the result of the unconverted recurrent neural network. We also combine the language models during decoding and obtain a mixed error rate reduction of 4.4% relative on the SEAME evaluation set.

[1]  Ngoc Thang Vu,et al.  Features for factored language models for code-Switching speech , 2014, SLTU.

[2]  Tan Lee,et al.  Automatic speech recognition of Cantonese-English code-mixing utterances , 2006, INTERSPEECH.

[3]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[4]  Yang Liu,et al.  Learning to Predict Code-Switching Points , 2008, EMNLP.

[5]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[6]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Haizhou Li,et al.  Recurrent neural network language modeling for code switching conversational speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[9]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Ngoc Thang Vu,et al.  Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding , 2014, INTERSPEECH.

[11]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Ngoc Thang Vu,et al.  BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[13]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  E. Chng,et al.  An Analysis of a Mandarin-English Code-switching Speech Corpus : SEAME , 2010 .

[15]  Kevin Duh,et al.  Factored Language Models Tutorial , 2007 .

[16]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[17]  Yangyang Shi,et al.  Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features , 2012, INTERSPEECH.

[18]  Ngoc Thang Vu,et al.  Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling , 2013, ACL.

[19]  Hermann Ney,et al.  A Hybrid Morphologically Decomposed Factored Language Models for Arabic LVCSR , 2010, HLT-NAACL.

[20]  Victoria Fromkin An Introduction to language / Victoria Fromkin, Robert Rodman , 1983 .

[21]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[22]  Ebru Arisoy,et al.  Converting Neural Network Language Models into back-off language models for efficient decoding in automatic speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[24]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.