Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions — including polysemy and the existence of multiword lexical items — into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.

[1]  L. Zgusta Multiword Lexical Units , 1967 .

[2]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Ronald Rosenfeld,et al.  Lattice based language models , 1997 .

[5]  Claudia Leacock,et al.  Polysemy: Theoretical and Computational Approaches , 2000 .

[6]  Colin Bannard,et al.  Stored Word Sequences in Language Learning , 2008, Psychological science.

[7]  Thomas L. Griffiths,et al.  Distributional Cues to Word Boundaries: Context is Important , 2008 .

[8]  Tatsuya Kawahara,et al.  Learning a language model from continuous speech , 2010, INTERSPEECH.

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Kathy Conklin,et al.  Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers , 2011 .

[11]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[13]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[14]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[15]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[16]  Xuanjing Huang,et al.  Gaussian Mixture Embeddings for Multiple Word Prototypes , 2015, ArXiv.

[17]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[18]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[19]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[21]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[22]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[23]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[25]  Yu Zhang,et al.  Latent Sequence Decompositions , 2016, ICLR.

[26]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[27]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[28]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[29]  Yoshua Bengio,et al.  Multiscale sequence modeling with a learned dictionary , 2017, ArXiv.

[30]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[31]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[32]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[33]  Matthias Sperber,et al.  Neural Lattice-to-Sequence Models for Uncertain Inputs , 2017, EMNLP.

[34]  Graham Neubig,et al.  Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.

[35]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Rongrong Ji,et al.  Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation , 2016, AAAI.

[37]  Andrew Gordon Wilson,et al.  Multimodal Word Distributions , 2017, ACL.

[38]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.