Mimicking Word Embeddings using Subword RNNs

Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low-resource settings.

[1]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[2]  Wang Ling,et al.  Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation , 2015, EMNLP.

[3]  Zeljko Agic,et al.  Empirically Sampling Universal Dependencies , 2017, UDW@NoDaLiDa.

[4]  Jan Hajic,et al.  Tagging Inflective Languages: Prediction of Morphological Categories for a Rich Structured Tagset , 1998, ACL.

[5]  Joakim Nivre,et al.  UD Treebank Sampling for Comparative Parser Evaluation , 2016 .

[6]  Kemal Oflazer,et al.  Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[7]  Nikola Ljubesic,et al.  Lemmatization and Morphosyntactic Tagging of Croatian and Serbian , 2013, BSNLP@ACL.

[8]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[9]  Preslav Nakov,et al.  Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian , 2012, EACL.

[10]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Wenpeng Yin,et al.  Learning Word Meta-Embeddings , 2016, ACL.

[13]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[14]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[15]  Marta R. Costa-jussà,et al.  Chinese–Spanish neural machine translation enhanced with character and word bitmap fonts , 2017, Machine Translation.

[16]  Parminder Bhatia,et al.  Morphological Priors for Probabilistic Neural Word Embeddings , 2016, EMNLP.

[17]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[18]  Jakub Zavrel,et al.  Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets , 2000, LREC.

[19]  Kevin Gimpel,et al.  Charagram: Embedding Words and Sentences via Character n-grams , 2016, EMNLP.

[20]  Khalil Sima'an,et al.  Building a tree-bank of modern hebrew text , 2001 .

[21]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[22]  Manaal Faruqui,et al.  Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning , 2015, TACL.

[23]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[24]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[25]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.