Computing Semantic Relatedness from Human Navigational Paths: A Case Study on Wikipedia

In this article, the authors present a novel approach for computing semantic relatedness and conduct a large-scale study of it on Wikipedia. Unlike existing semantic analysis methods that utilize Wikipedia's content or link structure, the authors propose to use human navigational paths on Wikipedia for this task. The authors obtain 1.8 million human navigational paths from a semi-controlled navigation experiment-a Wikipedia-based navigation game, in which users are required to find short paths between two articles in a given Wikipedia article network. The authors' results are intriguing: They suggest that i semantic relatedness computed from human navigational paths may be more precise than semantic relatedness computed from Wikipedia's plain link structure alone and ii that not all navigational paths are equally useful. Intelligent selection based on path characteristics can improve accuracy. The authors' work makes an argument for expanding the existing arsenal of data sources for calculating semantic relatedness and to consider the utility of human navigational paths for this task.

[1]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[5]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Manabu Okumura,et al.  Word Sense Disambiguation and Text Segmentation Based on Lexical Cohesion , 1994, COLING.

[8]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[9]  David M. W. Powers,et al.  Measuring Semantic Similarity in the Taxonomy of WordNet , 2005, ACSC.

[10]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[11]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[12]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[13]  Ted Pedersen,et al.  Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts , 2006 .

[14]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[15]  小嶋 秀樹,et al.  Computing lexical cohesion as a tool for text analysis , 1994 .

[16]  Andreas Hotho,et al.  Computing semantic relatedness from human navigational paths on Wikipedia , 2013, WWW '13 Companion.

[17]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[18]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[19]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[20]  van Gerardus Noord,et al.  Special issue: finite state methods in natural language processing , 2003 .

[21]  Simone Paolo Ponzetto,et al.  BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness , 2012, AAAI.

[22]  Simone Paolo Ponzetto,et al.  Collaboratively built semi-structured content and Artificial Intelligence: The story so far , 2013, Artif. Intell..

[23]  Christoph Trattner,et al.  Exploring Differences and Similarities between Hierarchical Decentralized Search and Human Navigation in Information Networks , 2012 .

[24]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[25]  H. Kozima Computing Lexical Cohesion as a Tool for Text Analysis , 1993 .

[26]  Dominik Benz,et al.  Stop thinking, start tagging: tag semantics emerge from collaborative verbosity , 2010, WWW '10.

[27]  A. Tversky Features of Similarity , 1977 .

[28]  Pavel Velikhov,et al.  Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation , 2008, SYRCoDIS.

[29]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[30]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[31]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[32]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[33]  Takahiro Hara,et al.  Wikipedia Link Structure and Text Mining for Semantic Relation Extraction , 2008, SemSearch.

[34]  Ziqi Zhang,et al.  Recent advances in methods of lexical semantic relatedness – a survey , 2012, Natural Language Engineering.

[35]  Morris Moscovitch,et al.  Can semantic relatedness explain the enhancement of memory for emotional words? , 2004, Memory & cognition.

[36]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[37]  D. Medin,et al.  Asymmetries of comparison , 1999, Psychonomic bulletin & review.

[38]  Simone Paolo Ponzetto,et al.  Knowledge Derived From Wikipedia For Computing Semantic Relatedness , 2007, J. Artif. Intell. Res..

[39]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[40]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[41]  David N. Milne Computing Semantic Relatedness using Wikipedia Link Structure , 2007 .

[42]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[43]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[44]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[45]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[46]  Kristina Lerman,et al.  Pragmatic evaluation of folksonomies , 2011, WWW.

[47]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[48]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[49]  D. Gentner,et al.  Respects for similarity , 1993 .

[50]  Dominik Benz,et al.  Evaluation of Folksonomy Induction Algorithms , 2012, TIST.

[51]  Andrei Popescu-Belis,et al.  Computing text semantic relatedness using the contents and links of a hypertext encyclopedia , 2013, Artif. Intell..

[52]  Rohini K. Srihari,et al.  Intelligent Indexing and Semantic Retrieval of Multimodal Documents , 2004, Information Retrieval.

[53]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[54]  Ciro Cattuto,et al.  Evaluating similarity measures for emergent semantics of social tagging , 2009, WWW '09.

[55]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[56]  Takahiro Hara,et al.  Association thesaurus construction methods based on link co-occurrence analysis for wikipedia , 2008, CIKM '08.