论文信息 - Neural Lattice Language Models - 字舞流文

Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions — including polysemy and the existence of multiword lexical items — into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a word-level baseline, and that a Chinese model that handles multi-character tokens is able to improve perplexity by 20.94% relative to a character-level baseline.

Graham Neubig | Jacob Buckman | J. Buckman | Graham Neubig | Jacob Buckman

[1] L. Zgusta. Multiword Lexical Units , 1967 .

[2] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Ronald Rosenfeld,et al. Lattice based language models , 1997 .

[5] Claudia Leacock,et al. Polysemy: Theoretical and Computational Approaches , 2000 .

[6] Colin Bannard,et al. Stored Word Sequences in Language Learning , 2008, Psychological science.

[7] Thomas L. Griffiths,et al. Distributional Cues to Word Boundaries: Context is Important , 2008 .

[8] Tatsuya Kawahara,et al. Learning a language model from continuous speech , 2010, INTERSPEECH.

[9] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[10] Kathy Conklin,et al. Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers , 2011 .

[11] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Andrew McCallum,et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[13] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.

[14] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[15] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[16] Xuanjing Huang,et al. Gaussian Mixture Embeddings for Multiple Word Prototypes , 2015, ArXiv.

[17] Wang Ling,et al. Character-based Neural Machine Translation , 2015, ArXiv.

[18] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[19] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[21] Noah A. Smith,et al. Recurrent Neural Network Grammars , 2016, NAACL.

[22] Alexander M. Rush,et al. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[23] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24] Wang Ling,et al. Latent Predictor Networks for Code Generation , 2016, ACL.

[25] Yu Zhang,et al. Latent Sequence Decompositions , 2016, ICLR.

[26] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[27] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[28] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[29] Yoshua Bengio,et al. Multiscale sequence modeling with a learned dictionary , 2017, ArXiv.

[30] Kevin Duh,et al. DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[31] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[32] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.

[33] Matthias Sperber,et al. Neural Lattice-to-Sequence Models for Uncertain Inputs , 2017, EMNLP.

[34] Graham Neubig,et al. Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.

[35] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[36] Rongrong Ji,et al. Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation , 2016, AAAI.

[37] Andrew Gordon Wilson,et al. Multimodal Word Distributions , 2017, ACL.

[38] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.