Neural network language models for low resource languages

For resource rich languages, recent works have shown Neural Network based Language Models (NNLMs) to be an effective modeling technique for Automatic Speech Recognition, out performing standard n-gram language models (LMs). For low resource languages, however, the performance of NNLMs has not been well explored. In this paper, we evaluate the effectiveness of NNLMs for low resource languages and show that NNLMs learn better word probabilities than state-of-theart n-gram models even when the amount of training data is severely limited. We show that interpolated NNLMs obtain a lower WER than standard n-gram models, no mater the amount of training data. Additionally, we observe that with small amounts of data (approx. 100k training tokens), feed-forward NNLMs obtain lower perplexity than recurrent NNLMs, while for the larger data condition (500k-1M training tokens), recurrent NNLMs can obtain lower perplexity than feed-forward models.

[1]  Jean-Luc Gauvain,et al.  Neural network language models for conversational speech recognition , 2004, INTERSPEECH.

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[4]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[5]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[6]  Florian Metze,et al.  Extracting deep bottleneck features using stacked auto-encoders , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[11]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  É MikaelBoden A guide to recurrent neural networks and backpropagation , 2001 .

[14]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[15]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[16]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.