What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

In this paper, we claim that Vector Cosine, which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models, can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that, independently of the adopted parameters, outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.

[1]  J. Firth,et al.  Papers in linguistics, 1934-1951 , 1957 .

[2]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Magnus Sahlgren,et al.  From Words to Understanding , 2001 .

[5]  R. Rapp Word sense discovery based on sense descriptor dissimilarity , 2003, MTSUMMIT.

[6]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[7]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[8]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[9]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[10]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[11]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[12]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[13]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[14]  Chu-Ren Huang,et al.  Unsupervised Measure of Word Similarity: How to Outperform Co-Occurrence and Vector Cosine in VSMs , 2016, AAAI.

[15]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[16]  Wael Hassan Gomaa,et al.  A Survey of Text Similarity Approaches , 2013 .

[17]  Chu-Ren Huang,et al.  ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms , 2016, AAAI.

[18]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[19]  Chu-Ren Huang,et al.  Taking Antonymy Mask off in Vector Space , 2014, PACLIC.

[20]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[21]  János Csirik,et al.  Computing Semantic Similarity Using Large Static Corpora , 2013, SOFSEM.

[22]  Chu-Ren Huang,et al.  EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models , 2015, LDL@IJCNLP.

[23]  Chu-Ren Huang,et al.  When Similarity Becomes Opposition: Synonyms and Antonyms Discrimination in DSMs , 2015 .

[24]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[25]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[26]  John A Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[27]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[28]  Peter D. Turney A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations , 2008, COLING.

[29]  Chu-Ren Huang,et al.  Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets , 2015, PACLIC.

[30]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[31]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .