A Comparison of Corpus-Based and Structural Methods on Approximation of Semantic Relatedness in Ontologies

In this paper, the authors compare the performance of corpus-based and structural approaches to determine semantic relatedness in ontologies. A large light-weight ontology and a news corpus are used as materials. The results show that structural measures proposed by Wu and Palmer, and Leacock and Chodorow have superior performance when cut-off values are used. The corpus-based method Latent Semantic Analysis is found more accurate on specific rank levels. In further investigation, the approximation of structural measures and Latent Semantic Analysis show a low level of overlap and the methods are found to approximate different types of relations. The results suggest that a combination of corpus-based methods and structural methods should be used and appropriate cut-off values should be selected according to the intended use case.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[5]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[6]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[8]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[9]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Preslav Nakov,et al.  Solving Relational Similarity Problems Using the Web as a Corpus , 2008, ACL.

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Zsófia Osváth,et al.  DOI: 10 , 2011 .

[14]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[15]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[16]  Juan C. Valle-Lisboa,et al.  The uncovering of hidden structures by Latent Semantic Analysis , 2007, Inf. Sci..

[17]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[18]  Eero Hyvönen,et al.  Proceedings of 18th the International Conference on Database and Expert Systems Applications DEXA 2007, Regensburg, Germany, Springer, September 3-7, 2007 , 2007 .

[19]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[20]  John R. Josephson,et al.  What Are They? Why Do We Need Them? , 1999 .

[21]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[22]  Lora Aroyo,et al.  Knowledge-Based Linguistic Annotation of Digital Cultural Heritage Collections , 2009, IEEE Intelligent Systems.

[23]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[24]  Filippo Menczer,et al.  Algorithmic detection of semantic similarity , 2005, WWW '05.

[25]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[26]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[27]  Jaana Kekäläinen,et al.  Using graded relevance assessments in IR evaluation , 2002, J. Assoc. Inf. Sci. Technol..

[28]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[29]  Yelena Yesha,et al.  CIKM 93, Proceedings of the Second International Conference on Information and Knowledge Management, Washington, DC, USA, November 1-5, 1993 , 1993 .

[30]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[31]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[32]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[33]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[34]  Peter D. Turney Similarity of Semantic Relations , 2006, CL.

[35]  Eero Hyvönen,et al.  A Method for Determining Ontology-Based Semantic Relevance , 2007, DEXA.

[36]  Marc Ehrig,et al.  Relaxed Precision and Recall for Ontology Matching , 2005, Integrating Ontologies.

[37]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[38]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[39]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[40]  Jaana Kekäläinen,et al.  The Co-Effects of Query Structure and Expansion on Retrieval Performance in Probabilistic Text Retrieval , 2004, Information Retrieval.

[41]  Eero Hyvönen,et al.  Building a National Semantic Web Ontology and Ontology Service Infrastructure -The FinnONTO Approach , 2008, ESWC.

[42]  Nicola Guarino,et al.  Sweetening Ontologies with DOLCE , 2002, EKAW.

[43]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[44]  A. Sheth International Journal on Semantic Web & Information Systems , .

[45]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.

[46]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[47]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.