A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[4]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[5]  George A. Miller,et al.  Annotating WordNet , 2004, FCP@NAACL-HLT.

[6]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[7]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[8]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[9]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[10]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[11]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[12]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Roberto Navigli,et al.  SemEval-2013 Task 12: Multilingual Word Sense Disambiguation , 2013, *SEMEVAL.

[15]  Manaal Faruqui,et al.  Community Evaluation and Exchange of Word Vectors at wordvectors.org , 2014, ACL.

[16]  Wenpeng Yin,et al.  An Exploration of Embeddings for Generalized Phrases , 2014, ACL.

[17]  Danushka Bollegala,et al.  Learning to Predict Distributions of Words Across Domains , 2014, ACL.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[20]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[21]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[22]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[23]  Roberto Navigli,et al.  SemEval-2015 Task 13: Multilingual All-Words Sense Disambiguation and Entity Linking , 2015, *SEMEVAL.

[24]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[25]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[26]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[27]  Jan Snajder,et al.  Predictability of Distributional Semantics in Derivational Word Formation , 2016, COLING.

[28]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[29]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[30]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[31]  Christiane Fellbaum,et al.  Automated WordNet Construction Using Word Embeddings , 2017 .

[32]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[33]  Zhe Gan,et al.  Learning Generic Sentence Representations Using Convolutional Neural Networks , 2016, EMNLP.

[34]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[35]  Benjamin Van Durme,et al.  Efficient, Compositional, Order-sensitive n-gram Embeddings , 2017, EACL.

[36]  Angeliki Lazaridou,et al.  Multimodal Word Meaning Induction From Minimal Exposure to Natural Text. , 2017, Cognitive science.

[37]  Graham Neubig,et al.  Cross-Lingual Word Embeddings for Low-Resource Language Modeling , 2017, EACL.

[38]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[39]  Marco Baroni,et al.  High-risk learning: acquiring new word vectors from tiny data , 2017, EMNLP.

[40]  Kevin Duh,et al.  A Multi-task Learning Approach to Adapting Bilingual Word Embeddings for Cross-lingual Named Entity Recognition , 2017, IJCNLP.

[41]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[42]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[43]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[44]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[45]  Carlos Gómez-Rodríguez,et al.  Global Transition-based Non-projective Dependency Parsing , 2018, ACL.

[46]  Mikhail Khodak,et al.  A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs , 2018, ICLR.

[47]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[48]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[49]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.