论文信息 - Subword-level Composition Functions for Learning Word Embeddings - 字舞流文

Subword-level Composition Functions for Learning Word Embeddings

Subword-level information is crucial for capturing the meaning and morphology of words, especially for out-of-vocabulary entries. We propose CNN- and RNN-based subword-level composition functions for learning word embeddings, and systematically compare them with popular word-level and subword-level models (Skip-Gram and FastText). Additionally, we propose a hybrid training scheme in which a pure subword-level model is trained jointly with a conventional word-level embedding model based on lookup-tables. This increases the fitness of all types of subword-level word embeddings; the word-level embeddings can be discarded after training, leaving only compact subword-level representation with much smaller data volume. We evaluate these embeddings on a set of intrinsic and extrinsic tasks, showing that subword-level models have advantage on tasks related to morphology and datasets with high OOV rate, and can be combined with other types of embeddings.

Xiaoyong Du | Tao Liu | Bofang Li | Aleksandr Drozd

[1] Evgeniy Gabrilovich,et al. A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[2] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[3] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[4] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[5] Phil Blunsom,et al. Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[6] Kentaro Inui,et al. Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[7] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[8] Tomas Mikolov,et al. Alternative structures for character-level RNNs , 2015, ArXiv.

[9] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[10] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[11] Omer Levy,et al. Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[12] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.

[13] Satoshi Matsuoka,et al. Discovering Aspectual Classes of Russian Verbs in Untagged Large Corpora , 2015, 2015 IEEE International Conference on Data Science and Data Intensive Systems.

[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15] Jacob Eisenstein,et al. Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.

[16] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[17] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[18] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[19] Xiaoyong Du,et al. Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics , 2017, EMNLP.

[20] Kris Cao,et al. A Joint Model for Word Embedding and Word Morphology , 2016, Rep4NLP@ACL.

[21] Ryan Cotterell,et al. Morphological Word-Embeddings , 2019, NAACL.

[22] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[23] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[24] Jan Niehues,et al. N-Gram-based Input Encoding for Continuous Space Language Models , 2014 .

[25] Tie-Yan Liu,et al. Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[26] Wolfgang Lezius,et al. TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[27] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[29] Bo Pang,et al. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[30] Marco Marelli,et al. Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[31] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[32] Cícero Nogueira dos Santos,et al. Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[33] Satoshi Matsuoka,et al. Analogy-based detection of morphological and semantic relations with word embeddings: what works and what doesn’t. , 2016, NAACL.

[34] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[35] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.

[36] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[37] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[38] Satoshi Matsuoka,et al. Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.

[39] Bofang Li,et al. The (too Many) Problems of Analogical Reasoning with Word Vectors , 2017, *SEMEVAL.

[40] Xiaoyong Du,et al. Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings , 2017, EMNLP.

[41] Katrin Kirchhoff,et al. Factored Neural Language Models , 2006, NAACL.

[42] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[43] Iryna Gurevych,et al. Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[44] Omer Levy,et al. Dependency-Based Word Embeddings , 2014, ACL.

[45] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[46] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[47] Christopher D. Manning,et al. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.