论文信息 - Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling - 字舞流文

Learning to Create and Reuse Words in Open-Vocabulary Neural Language Modeling

Fixed-vocabulary language models fail to account for one of the most characteristic statistical facts of natural language: the frequent creation and reuse of new word types. Although character-level language models offer a partial solution in that they can create word types not attested in the training corpus, they do not capture the "bursty" distribution of such words. In this paper, we augment a hierarchical LSTM language model that generates sequences of word tokens character by character with a caching mechanism that learns to reuse previously generated words. To validate our model we construct a new open-vocabulary language modeling corpus (the Multilingual Wikipedia Corpus, MWC) from comparable Wikipedia articles in 7 typologically diverse languages and demonstrate the effectiveness of our model across this range of languages.

Chris Dyer | Phil Blunsom | Kazuya Kawakami | Chris Dyer | P. Blunsom | Kazuya Kawakami

[1] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[2] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.

[3] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.

[4] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[7] Ying Zhang,et al. On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[8] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.

[9] Noah A. Smith,et al. Knowledge-Rich Morphological Priors for Bayesian Language Models , 2013, NAACL.

[10] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[11] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[12] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[13] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.

[14] Erhardt Barth,et al. Recurrent Dropout without Memory Loss , 2016, COLING.

[15] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[16] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[17] T. Griffiths,et al. A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[20] Jakob Grue Simonsen,et al. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[21] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[22] H. S. Heaps,et al. Information retrieval, computational and theoretical aspects , 1978 .

[23] Kenneth Ward Church. Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to p/2 than p2 , 2000, COLING.

[24] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[26] Kenneth Ward Church,et al. Poisson mixtures , 1995, Natural Language Engineering.