CPRel: Semantic Relatedness Computation Using Wikipedia based Context Profiles

Semantic relatedness is a well known problem with its sig- nicance ranging from computational linguistics to Natural language Processing applications. Relatedness computation is restricted by the amount of common sense and background knowledge required to relate any two terms. This paper proposes a novel model of relatedness using context prole built on features extracted from encyclopedic knowledge. Proposed research makes use of Wikipedia to represent the context of a word in the high dimensional space of Wikipedia labels. Semantic relat- edness of a word pair is then assessed by comparing their corresponding context proles based on three dierent weighting schemes using tradi- tional Cosine similarity metrics. To evaluate proposed relatedness ap- proach, three well known benchmark datasets are used and it is shown that Wikipedia article contents can be used eectively to compute term relatedness. The experiments demonstrate that the proposed approach is computationally cheap as well as eective when correlated with human judgments.

[1]  Ted Pedersen,et al.  Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet , 2012, IHI '12.

[2]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Xihong Wu,et al.  Text Segmentation with LDA-Based Fisher Kernel , 2008, ACL.

[5]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[6]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[7]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[8]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  Ian H. Witten,et al.  Topic indexing with Wikipedia , 2008 .

[11]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[12]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[13]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[14]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[15]  Chris H. Q. Ding,et al.  Automatic topic identification using webpage clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Fred Popowich,et al.  Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[18]  Xiaoying Gao,et al.  Harnessing Wikipedia Semantics for Computing Contextual Relatedness , 2012, PRICAI.

[19]  Rada Mihalcea,et al.  Topic Identification Using Wikipedia Graph Centrality , 2009, NAACL.

[20]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[21]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[22]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[23]  Eneko Agirre,et al.  WikiWalk: Random walks on Wikipedia for Semantic Relatedness , 2009, Graph-based Methods for Natural Language Processing.

[24]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[25]  De Xu,et al.  Concept vector for semantic similarity and relatedness based on WordNet structure , 2012, J. Syst. Softw..

[26]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[27]  Simone Paolo Ponzetto,et al.  BabelRelate! A Joint Multilingual Approach to Computing Semantic Relatedness , 2012, AAAI.

[28]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[29]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[30]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[31]  Iryna Gurevych,et al.  Automatically Creating Datasets for Measures of Semantic Relatedness , 2006, ACL 2006.

[32]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[33]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[34]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.