An Association Network for Computing Semantic Relatedness

To judge how much a pair of words (or texts) are semantically related is a cognitive process. However, previous algorithms for computing semantic relatedness are largely based on co-occurrences within textual windows, and do not actively leverage cognitive human perceptions of relatedness. To bridge this perceptional gap, we propose to utilize free association as signals to capture such human perceptions. However, free association, being manually evaluated, has limited lexical coverage and is inherently sparse. We propose to expand lexical coverage and overcome sparseness by constructing an association network of terms and concepts that combines signals from free association norms and five types of cooccurrences extracted from the rich structures of Wikipedia. Our evaluation results validate that simple algorithms on this network give competitive results in computing semantic relatedness between words and between short texts.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Weiwei Guo,et al.  Improving Lexical Semantics for Sentential Semantics: Modeling Selectional Preference and Similar Words in a Latent Variable Model , 2013, HLT-NAACL.

[3]  Weiwei Guo,et al.  Modeling Sentences in the Latent Space , 2012, ACL.

[4]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[5]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[6]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[7]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[8]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[9]  Mario Jarmasz,et al.  Roget's Thesaurus as a Lexical Resource for Natural Language Processing , 2012, ArXiv.

[10]  James J. Jenkins,et al.  THE 1952 MINNESOTA WORD ASSOCIATION NORMS , 1970 .

[11]  Zuhair Bandar,et al.  A Comparative Study of Two Short Text Semantic Similarity Measures , 2008, KES-AMSTA.

[12]  Masrah Azrifah Azmi Murad,et al.  Word Sense Disambiguation-based Sentence Similarity , 2010, COLING.

[13]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[14]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Grace Helen Kent,et al.  A Study Of Association In Insanity , 1910 .

[16]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[17]  Iraklis Varlamis,et al.  Text Relatedness Based on a Word Thesaurus , 2010, J. Artif. Intell. Res..

[18]  Michael McGill,et al.  An Evaluation of Factors Affecting Document Ranking by Information Retrieval Systems. , 1979 .

[19]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[20]  Xiaoying Liu,et al.  Sentence Similarity based on Dynamic Time Warping , 2007, International Conference on Semantic Computing (ICSC 2007).

[21]  Bob Rehder,et al.  How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[22]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[23]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[24]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[25]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[26]  Reinhard Rapp,et al.  Computation of Word Associations Based on Co-occurrences of Words in Large Corpora , 1993, VLC@ACL.

[27]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[28]  George W. Davidson,et al.  Roget's Thesaurus of English Words and Phrases , 1982 .

[29]  Justin Washtell,et al.  Co-Dispersion: A Windowless Approach to Lexical Association , 2009, EACL.

[30]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[31]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[32]  Zuhair Bandar,et al.  A new benchmark dataset with production methodology for short text semantic similarity algorithms , 2013, TSLP.

[33]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[36]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.