SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS

We explore the performance of several types of language mode ls on the word-level and the character-level language modelin g tasks. This includes two recently proposed recurrent neural netwo rk architectures, a feedforward neural network model, a maximum ent ropy model and the usual smoothed n-gram models. We then propose a simple technique for learning sub-word level units from th e data, and show that it combines advantages of both character and wo rdlevel models. Finally, we show that neural network based lan gu ge models can be order of magnitude smaller than compressed n-g ram models, at the same level of performance when applied to a Bro dcast news RT04 speech recognition task. By using sub-word un its, the size can be reduced even more.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Matthew V. Mahoney,et al.  Adaptive weighing of context models for lossless data compression , 2005 .

[6]  Kenneth Ward Church,et al.  Compressing Trigram Language Models With Golomb Coding , 2007, EMNLP.

[7]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[8]  Taro Watanabe,et al.  A Succinct N-gram Language Model , 2009, ACL/IJCNLP.

[9]  Hynek Hermansky,et al.  Recovery of Rare Words in Lecture Speech , 2010, TSD.

[10]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[11]  Brian Kingsbury,et al.  The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[12]  Kenneth Ward Church,et al.  A Fast Re-scoring Strategy to Capture Long-Distance Dependencies , 2011, EMNLP.

[13]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[15]  Mark Dredze,et al.  Learning Sub-Word Units for Open Vocabulary Speech Recognition , 2011, ACL.

[16]  Hermann Ney,et al.  Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR , 2011, INTERSPEECH.

[17]  Tim Ng,et al.  Mandarin Word-Character Hybrid-Input Neural Network Language Model , 2011, INTERSPEECH.

[18]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[19]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[20]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.