论文信息 - Learning to Compute Word Embeddings On the Fly - 字舞流文

Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the ``long tail'' of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.

Pascal Vincent | Yoshua Bengio | Edward Grefenstette | Dzmitry Bahdanau | Stanislaw Jastrzebski | Tom Bosc | Yoshua Bengio | Pascal Vincent | Dzmitry Bahdanau | Stanislaw Jastrzebski | Edward Grefenstette | Tom Bosc

[1] Yuen Ren Chao,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[2] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[3] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[7] E. Loper,et al. NLTK: The Natural Language Toolkit , 2006, ACL 2006.

[8] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[9] Yoshua Bengio,et al. Zero-data Learning of New Tasks , 2008, AAAI.

[10] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[11] Alessandro Lenci,et al. How we BLESSed distributional semantic evaluation , 2011, GEMS.

[12] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[14] Gang Wang,et al. RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[15] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17] Yoshua Bengio,et al. Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[18] Wang Ling,et al. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[19] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[21] Zhen-Hua Ling,et al. Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference , 2016, ArXiv.

[22] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[23] Louise Deléger,et al. Overview of the Bacteria Biotope Task at BioNLP Shared Task 2016 , 2016, BioNLP.

[24] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[25] David Grangier,et al. Vocabulary Selection Strategies for Neural Machine Translation , 2016, ArXiv.

[26] Jackie Chi Kit Cheung,et al. Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data , 2016, ACL.

[27] Yoshua Bengio,et al. Learning to Understand Phrases by Embedding the Dictionary , 2015, TACL.

[28] Zhiguo Wang,et al. Vocabulary Manipulation for Neural Machine Translation , 2016, ACL.

[29] Jackie Chi Kit Cheung,et al. World Knowledge for Reading Comprehension: Rare Entity Prediction with Hierarchical LSTMs Using External Descriptions , 2017, EMNLP.

[30] Ruslan Salakhutdinov,et al. A Comparative Study of Word Embeddings for Reading Comprehension , 2017, ArXiv.

[31] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.

[32] Dirk Weissenborn,et al. Reading Twice for Natural Language Understanding , 2017, ArXiv.

[33] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[34] Chris Dyer,et al. Dynamic Integration of Background Knowledge in Neural NLU Systems , 2017, 1706.02596.

[35] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.