Determining semantic relatedness through the measurement of discrimination information using Jensen difference

Measurement of semantic relatedness has been addressed in a number of application tasks and by researchers in a variety of disciplines. Measurement of discrimination information of terms is a fundamental issue for many areas of science. In this study, we attempt to introduce relatedness measures based on discrimination measures, with the aim of making fundamental concepts accessible and usable to the broad community of data analysis practitioners. We present an in-depth investigation into the basic concept of discrimination information conveyed in a term based on Jensen difference. The discrimination measures can then naturally and conveniently be utilized to introduce two generic concepts of semantic relatedness. We also address the issue of estimating arguments embedded in the relatedness measures and then demonstrate how our method can be supported by empirical evidence drawn from performance experiments. © 2009 Wiley Periodicals, Inc.

[1]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[2]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[3]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[4]  Graeme Hirst,et al.  Correcting real-word spelling errors by restoring lexical cohesion , 2005, Natural Language Engineering.

[5]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[6]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[7]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[10]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[11]  Joachim M. Buhmann,et al.  Coupled Clustering: A Method for Detecting Structural Correspondence , 2001, J. Mach. Learn. Res..

[12]  Steffen Staab,et al.  Word classification based on combined measures of distributional and semantic similarity , 2003, EACL.

[13]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[15]  Guihai Chen,et al.  ADSS: An approach to determining semantic similarity , 2006, Adv. Eng. Softw..

[16]  Stephen Robertson Term frequency and term value , 1981, SIGIR 1981.

[17]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[18]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[19]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[20]  Weiguo Fan,et al.  Effective profiling of consumer information retrieval needs: a unified framework and empirical comparison , 2005, Decis. Support Syst..

[21]  Donna K. Harman,et al.  An experimental study of factors important in document ranking , 1986, SIGIR '86.

[22]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[23]  Ellen M. Voorhees,et al.  The seventh text REtrieval conference (TREC-7) , 1999 .

[24]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[25]  C. Burgess,et al.  Semantic and associative priming in the cerebral hemispheres: Some words do, some words don't … sometimes, some places , 1990, Brain and Language.

[26]  C. J. van Rijsbergen,et al.  Learning semantic relatedness from term discrimination information , 2009, Expert Syst. Appl..

[27]  S. K. Wong,et al.  An Information-Theoretic Measure of Term Specificity. , 1992 .

[28]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[29]  Graeme Hirst,et al.  Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[30]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[31]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[32]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[33]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[34]  C. R. Rao,et al.  Diversity: its measurement, decomposition, apportionment and analysis , 1982 .

[35]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[36]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[37]  Dan Moldovan,et al.  Models for the Semantic Classification of Noun Phrases , 2004, HLT-NAACL 2004.

[38]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[39]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[40]  Iapan Kumar Nayak On diversity measures based on entropy functions , 1985 .