Object Similarity in Ontologies: A Foundation for Business Intelligence Systems and High-Performance Retrieval

Finding good algorithms for assessing the similarity of complex objects in ontologies is central to the functioning of techniques such as retrieval, matchmaking, clustering, data-mining, semantic sense disambiguation, ontology translations, and simple object comparisons. These techniques provide the basis for supporting a wide variety of business intelligence computing tasks like finding a process in a best practice repository, finding a suitable service provider or outsourcing partner for agile process enactment, dynamic customer segmentation, semantic web applications, and systems integration. To our knowledge, however, there exists no study that systematically compares the prediction quality of ontology-based similarity measures. This paper assembles a catalogue of ontology-based similarity measures that are (partially) adapted from related domains. These measures are compared to each other within a large, real-world best practice ontology as well as evaluated in a realistic business process retrieval scenario. We find that different similarity algorithms reflect different notions of similarity. We also show how a combination of similarity measures can be used to improve both precision and recall of an ontology-based, query-by-example style, object-retrieval approach. Combining the study’s findings with the literature, we argue for the need of extended studies to assemble a more complete catalogue of object similarity measures that can be evaluated in many applications and ontologies.

[1]  Roger M. Needham,et al.  The thesaurus approach to information retrieval , 1958 .

[2]  Journal of the Association for Computing Machinery , 1961, Nature.

[3]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[6]  M. Ross Quillian,et al.  Retrieval time from semantic memory , 1969 .

[7]  F. W. Lancaster,et al.  MEDLARS: Report on the Evaluation of Its Operating Efficiency. , 1997 .

[8]  Carol Conrad Cognitive Economy in Semantic Memory. , 1972 .

[9]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[10]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[11]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[12]  A. Tversky Features of Similarity , 1977 .

[13]  Tadeusz Radecki,et al.  Fuzzy set theoretical approach to document retrieval , 1979, Inf. Process. Manag..

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[16]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[17]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[18]  Hanspeter Giger Concept based retrieval in classical IR systems , 1988, SIGIR '88.

[19]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[20]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[21]  D. Lucarella,et al.  Uncertainty in information retrieval: an approach based on fuzzy sets , 1990, Ninth Annual International Phoenix Conference on Computers and Communications. 1990 Conference Proceedings.

[22]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[23]  Premkumar T. Devanbu,et al.  LaSSIE—a knowledge-based software information system , 1991, ICSE '90.

[24]  Jin H. Kim,et al.  A Model of Knowledge Based Information Retrieval with Hierarchical Concept Graph , 1990, J. Documentation.

[25]  Susan T. Dumais,et al.  Enhancing Performance in Latent Semantic Indexing (LSI) Retrieval , 1990 .

[26]  Louis M. Gomez,et al.  No IFs, ANDs, or ORs: A Study of Database Querying , 1990, Int. J. Man Mach. Stud..

[27]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[28]  Gert Smolka,et al.  Attributive Concept Descriptions with Complements , 1991, Artif. Intell..

[29]  Peter Manhart,et al.  A Knowledge And Deduction Based Software Retrieval Tool , 1991, Proceedings., 6th Annual Knowledge-Based Software Engineering Conference.

[30]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[31]  Ben Shneiderman,et al.  Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[32]  Anne H. Soukhanov,et al.  The american heritage dictionary of the english language , 1992 .

[33]  G. Miller WordNet: A Lexical Database for English , 1992, HLT.

[34]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[35]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[36]  Matthias Jarke,et al.  On the retrieval of reusable software components , 1993, [1993] Proceedings Advances in Software Reuse.

[37]  Maria Grazia Fugini,et al.  Retrieval of reusable components in a development information system , 1993, [1993] Proceedings Advances in Software Reuse.

[38]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[39]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[40]  Yaakov Kareev,et al.  What Do You Expect to Get When You Ask for "A Cup of Coffee and a Muffin or a Croissant"? , 1993, Int. J. Man Mach. Stud..

[41]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[42]  Didier Dubois,et al.  Readings in Fuzzy Sets for Intelligent Systems , 1993 .

[43]  Kevin Crowston,et al.  Tools for inventing organizations: toward a handbook of organizational processes , 1993, [1993] Proceedings Second Workshop on Enabling Technologies@m_Infrastructure for Collaborative Enterprises.

[44]  Michael Sussna,et al.  Word sense disambiguation for free-text indexing using a massive semantic network , 1993, CIKM '93.

[45]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[46]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[47]  Valerie Cros,et al.  Fuzzy information retrieval , 1994, Journal of Intelligent Information Systems.

[48]  John Murphy,et al.  Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words , 1994 .

[49]  Scott Henninger,et al.  Information access tools for software reuse , 1995, J. Syst. Softw..

[50]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[51]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[52]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[53]  C Fernandezchamizo,et al.  Case-based retrieval of software components*1 , 1995 .

[54]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[55]  Yorick Wilks,et al.  The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging? , 1996, ArXiv.

[56]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[57]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[58]  Nicola Guarino,et al.  A Pointless Theory of Space Based on Strong Connection and Congruence , 1996, KR.

[59]  Daniel Kuokka,et al.  Issues and Extensions for Information Matchmaking Protocols , 1996, Int. J. Cooperative Inf. Syst..

[60]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[61]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[62]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[63]  R. A. Pease,et al.  JTF-ATD Core Plan Representation: a Progress Report , 1997 .

[64]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[65]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[66]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[67]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[68]  Mark A. Musen,et al.  SMART: Automated Support for Ontology Merging and Alignment , 1999 .

[69]  Karen Sparck Jones What is the Role of NLP in Text Retrieval , 1999 .

[70]  Donald H. Kraft,et al.  Fuzzy Set Techniques in Information Retrieval , 1999 .

[71]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[72]  Kaizhong Zhang,et al.  Identifying Approximately Common Substructures in Trees Based on a Restricted Edit Distance , 1999, Inf. Sci..

[73]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[74]  William W. Cohen Data integration using similarity joins and a word-based information representation language , 2000, TOIS.

[75]  Troels Andreasen,et al.  Ontology-Based Querying , 2000, FQAS.

[76]  ISTVAN JONYER,et al.  Graph-Based Hierarchical Conceptual Clustering , 2000, Int. J. Artif. Intell. Tools.

[77]  Carol Tenopir,et al.  Users' interaction with World Wide Web resources: an exploratory study using a holistic approach , 2000, Inf. Process. Manag..

[78]  Jørgen Fischer Nilsson A logico-algebraic framework for ontologies , 2001 .

[79]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[80]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[81]  Troels Andreasen Query evaluation based on domain-specific ontologies , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[82]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[83]  Nicola Guarino,et al.  An Ontological Theory of Physical Objects , 2001 .

[84]  James A. Hendler,et al.  The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities , 2001 .

[85]  Ian Horrocks,et al.  OIL: An Ontology Infrastructure for the Semantic Web , 2001, IEEE Intell. Syst..

[86]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[87]  Mark Klein,et al.  Discovering Services: Towards High-Precision Service Retrieval , 2002, WES.

[88]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[89]  Mark Klein,et al.  Towards High-Precision Service Retrieval , 2002, SEMWEB.

[90]  Ian Horrocks,et al.  DAML+OIL: A Reason-able Web Ontology Language , 2002, EDBT.

[91]  Troels Andreasen,et al.  On Measuring Similarity for Conceptual Querying , 2002, FQAS.

[92]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[93]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[94]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[95]  Ian Horrocks,et al.  Querying the Semantic Web: A Formal Approach , 2002, SEMWEB.

[96]  Yong Yu,et al.  Conceptual Graph Matching for Semantic Search , 2002, ICCS.

[97]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[98]  T. Andreasen,et al.  From Ontology over Similarity to Query Evaluation , 2003 .

[99]  Troels Andreasen,et al.  Similarity for Conceptual Querying , 2003, ISCIS.

[100]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[101]  Troels Andreasen,et al.  Similarity Graphs , 2003, ISMIS.

[102]  Ramanathan V. Guha,et al.  TAP: a Semantic Web platform , 2003, Comput. Networks.

[103]  Rasmus Knappe On Similarity Measures for Concept-based Querying , 2003 .

[104]  John F. Roddick,et al.  A Unifying Semantic Distance Model for Determining the Similarity of Attribute Values , 2003, ACSC.

[105]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[106]  Alan H. Karp,et al.  E-speak e-xplained , 2003, CACM.

[107]  Luigi Palopoli,et al.  Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases , 2003, IEEE Trans. Knowl. Data Eng..

[108]  Kevin Crowston,et al.  Organizing Business Knowledge: The MIT Process Handbook , 2003 .

[109]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[110]  E. Dura Natural Language in Information Retrieval , 2003, CICLing.

[111]  Troels Andreasen,et al.  Similarity from conceptual relations , 2003, 22nd International Conference of the North American Fuzzy Information Processing Society, NAFIPS 2003.

[112]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[113]  Francesco M. Donini,et al.  A system for principled matchmaking in an electronic marketplace , 2003, WWW '03.

[114]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[115]  Athman Bouguettaya,et al.  Efficient access to Web services , 2004, IEEE Internet Computing.

[116]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[117]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[118]  Asunción Gómez-Pérez,et al.  Ontological Engineering: With Examples from the Areas of Knowledge Management, e-Commerce and the Semantic Web , 2004, Advanced Information and Knowledge Processing.

[119]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[120]  Ronald R. Yager A Hierarchical Document Retrieval Language , 2004, Information Retrieval.

[121]  Mark Klein,et al.  Massachusetts Institute of Technology Abraham Bernstein University of Zurich Toward High-Precision Service Retrieval , 2022 .

[122]  Troels Andreasen,et al.  On Querying Ontologies and Databases , 2004, FQAS.

[123]  Troels Andreasen,et al.  Content-based text querying with ontological descriptors , 2004, Data Knowl. Eng..

[124]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[125]  Troels Andreasen,et al.  Modelling and Use of Domain-Specific Knowledge for Similarity and Visualization , 2005 .

[126]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[127]  Troels Andreasen,et al.  On Automatic Modeling and Use of Domain-Specific Ontologies , 2005, ISMIS.

[128]  Troels Andreasen,et al.  On retrieval guided by extracted domain-specific knowledge , 2005, EUSFLAT Conf..

[129]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .