Evaluation of Word Vector Representations by Subspace Alignment

Unsupervisedly learned word vectors have proven to provide exceptionally effective features in many NLP tasks. Most common intrinsic evaluations of vector quality measure correlation with similarity judgments. However, these often correlate poorly with how well the learned representations perform as features in downstream evaluation tasks. We present QVEC—a computationally inexpensive intrinsic evaluation measure of the quality of word embeddings based on alignment to a matrix of features extracted from manually crafted lexical resources—that obtains strong correlation with performance of the vectors in a battery of downstream semantic evaluation tasks.1

[1]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Angeliki Lazaridou,et al.  Fish Transporters and Miracle Homes: How Compositional Distributional Semantics can Help NP Parsing , 2013, EMNLP.

[6]  Manaal Faruqui,et al.  Community Evaluation and Exchange of Word Vectors at wordvectors.org , 2014, ACL.

[7]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[8]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[9]  Vivi Nastase,et al.  Unsupervised All-words Word Sense Disambiguation with Grammatical Dependencies , 2008, IJCNLP.

[10]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[11]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[12]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[13]  Yulia Tsvetkov,et al.  Augmenting English Adjective Senses with Supersenses , 2014, LREC.

[14]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[15]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[16]  Yulia Tsvetkov,et al.  Metaphor Detection with Cross-Lingual Model Transfer , 2014, ACL.

[17]  Ramón Fernández Astudillo,et al.  Not All Contexts Are Created Equal: Better Word Representations with Variable Attention , 2015, EMNLP.

[18]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[19]  Manaal Faruqui,et al.  Non-distributional Word Vector Representations , 2015, ACL.

[20]  Noah A. Smith,et al.  Linguistic Structured Sparsity in Text Categorization , 2014, ACL.

[21]  Patricia Rose Gomes de Melo Viol Martins,et al.  MATHEMATICS WITHOUT NUMBERS: AN INTRODUCTION TO THE STUDY OF LOGIC , 2015 .

[22]  Kemal Oflazer,et al.  Supersense Tagging for Arabic: the MT-in-the-Middle Attack , 2013, NAACL.

[23]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[24]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[27]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.