论文信息 - Lexical Semantic Relatedness with Random Graph Walks - 字舞流文

Lexical Semantic Relatedness with Random Graph Walks

Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.

Thad Hughes | Daniel Ramage | D. Ramage | Thad Hughes

[1] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[2] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[3] Jianhua Lin,et al. Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[4] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[5] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[6] Lillian Lee,et al. Measures of Distributional Similarity , 1999, ACL.

[7] John Odentrantz,et al. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues , 2000, Technometrics.

[8] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9] Lillian Lee,et al. On the effectiveness of the skew divergence for statistical language analysis , 2001, AISTATS.

[10] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[11] Dan Klein,et al. Evaluating strategies for similarity search on the web , 2002, WWW '02.

[12] Stan Szpakowicz,et al. Roget's thesaurus and semantic similarity , 2012, RANLP.

[13] Ted Pedersen,et al. Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[14] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15] Ted Pedersen,et al. WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[16] Andrew Y. Ng,et al. Learning random walk models for inducing word dependency distributions , 2004, ICML.

[17] Pavel Berkhin,et al. A Survey on PageRank Computing , 2005, Internet Math..

[18] Kevyn Collins-Thompson,et al. Query expansion using random walk models , 2005, CIKM '05.

[19] David J. Weir,et al. Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[20] Rada Mihalcea,et al. Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[21] Rada Mihalcea,et al. Semantic document engineering with WordNet and PageRank , 2005, SAC '05.

[22] Graeme Hirst,et al. Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[23] Ted Pedersen,et al. Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[24] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[25] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.