K-component recurrent neural network language models using curriculum learning

Conventional n-gram language models are known for their limited ability to capture long-distance dependencies and their brittleness with respect to within-domain variations. In this paper, we propose a k-component recurrent neural network language model using curriculum learning (CL-KRNNLM) to address within-domain variations. Based on a Dutch-language corpus, we investigate three methods of curriculum learning that exploit dedicated component models for specific sub-domains. Under an oracle situation in which context information is known during testing, we experimentally test three hypotheses. The first is that domain-dedicated models perform better than general models on their specific domains. The second is that curriculum learning can be used to train recurrent neural network language models (RNNLMs) from general patterns to specific patterns. The third is that curriculum learning, used as an implicit weighting method to adjust the relative contributions of general and specific patterns, outperforms conventional linear interpolation. Under the condition that context information is unknown during testing, the CL-KRNNLM also achieves improvement over conventional RNNLM by 13% relative in terms of word prediction accuracy. Finally, the CL-KRNNLM is tested in an additional experiment involving N-best rescoring on a standard data set. Here, the context domains are created by clustering the training data using Latent Dirichlet Allocation and k-means clustering.

[1]  Mari Ostendorf,et al.  Language Modeling with Sentence-Level Mixtures , 1994, HLT.

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Lou Boves,et al.  Experiences from the Spoken Dutch Corpus Project , 2002, LREC.

[4]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[5]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Mary P. Harper,et al.  The SuperARV Language Model: Investigating the Effectiveness of Tightly Integrating Multiple Knowledge Sources , 2002, EMNLP.

[11]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[12]  Antal van den Bosch Scalable classification-based word prediction and confusible correction , 2005 .

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Yangyang Shi,et al.  K-Component Adaptive Recurrent Neural Network Language Models , 2013, TSD.

[15]  Yangyang Shi,et al.  Socio-situational setting classification based on language use , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[17]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).