Generalizing Word Embeddings using Bag of Subwords

We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character n-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model’s ability in capturing the relationship between words’ textual representations and their embeddings.

[1]  Ryan Cotterell,et al.  Morphological Word-Embeddings , 2019, NAACL.

[2]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[3]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[4]  Jacob Eisenstein,et al.  Mimicking Word Embeddings using Subword RNNs , 2017, EMNLP.

[5]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[6]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[7]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[10]  Tie-Yan Liu,et al.  Co-learning of Word Representations and Morpheme Representations , 2014, COLING.

[11]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[12]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[13]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[14]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[15]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[16]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[17]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Kevin Gimpel,et al.  Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[20]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.