论文信息 - A Comparison of Word Similarity Performance Using Explanatory and Non-explanatory Texts

A Comparison of Word Similarity Performance Using Explanatory and Non-explanatory Texts

Vectorial representations of words derived from large current events datasets have been shown to perform well on word similarity tasks. This paper shows vectorial representations derived from substantially smaller explanatory text datasets such as English Wikipedia and Simple English Wikipedia preserve enough lexical semantic information to make these kinds of category judgments with equal or better accuracy.

William Schuler | Lifeng Jin

[1] S. T. Dumais,et al. Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[2] Jens Lehmann,et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Omer Levy,et al. Dependency-Based Word Embeddings , 2014, ACL.

[5] Yong Yu,et al. Learning Word Representation Considering Proximity and Ambiguity , 2014, AAAI.

[6] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[7] Mark Dredze,et al. Learning Simple Wikipedia: A Cogitation in Ascertaining Abecedarian Language , 2010, HLT-NAACL 2010.

[8] Eneko Agirre,et al. A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[9] David Kauchak,et al. Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[10] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.