Computing Semantic Similarity Using Ontologies

Determining semantic similarity of two sets of words that describe two entities is an important problem in web mining (search and recommendation systems), targeted advertisement and domains that need semantic content matching. Traditional Information Retrieval approaches, even when extended to include semantics by performing the similarity comparison on concepts instead of words/terms, may not always determine the right matches when there is no direct overlap in the exact concepts that represent the semantics. As the entity descriptions are treated as self-contained units, the relationships that are not explicit in the entity descriptions are usually ignored. We extend this notion of semantic similarity to consider inherent relationships between concepts using ontologies. We propose simple metrics for computing semantic similarity using spreading activation networks with multiple mechanisms for activation (set based spreading and graph based spreading) and concept matching (using bipartite graphs). We evaluate these metrics in the context of matching two user profiles to determine overlapping interests between users. Our similarity computation results show an improvement in accuracy over other approaches, when compared with human-computed similarity. Although the techniques presented here are used to compute similarity between two user profiles, these are applicable to any content matching scenario. External Posting Date: July 6, 2008 [Fulltext] Approved for External Publication Internal Posting Date: July 6, 2008 [Fulltext] Submitted to ISWC 08, the International Semantic Web Conference (ISWC), 2008, Karlsruhe, Germany © Copyright 2008 Hewlett-Packard Development Company, L.P. Computing Semantic Similarity Using Ontologies Rajesh Thiagarajan, Geetha Manjunath, and Markus Stumptner 1 Advanced Computing Research Centre, University of South Australia {cisrkt|mst}@cs.unisa.edu.au 2 Hewlett-Packard Labs, Bangalore, India geetha.manjunath@hp.com Abstract. Determining semantic similarity of two sets of words that Determining semantic similarity of two sets of words that describe two entities is an important problem in web mining (search and recommendation systems), targeted advertisement and domains that need semantic content matching. Traditional Information Retrieval approaches, even when extended to include semantics by performing the similarity comparison on concepts instead of words/terms, may not always determine the right matches when there is no direct overlap in the exact concepts that represent the semantics. As the entity descriptions are treated as self-contained units, the relationships that are not explicit in the entity descriptions are usually ignored. We extend this notion of semantic similarity to consider inherent relationships between concepts using ontologies. We propose simple metrics for computing semantic similarity using spreading activation networks with multiple mechanisms for activation (set based spreading and graph based spreading) and concept matching (using bipartite graphs). We evaluate these metrics in the context of matching two user profiles to determine overlapping interests between users. Our similarity computation results show an improvement in accuracy over other approaches, when compared with human-computed similarity. Although the techniques presented here are used to compute similarity between two user profiles, these are applicable to any content matching scenario.

[1]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[2]  Nancy Ide,et al.  Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries , 1990, COLING.

[3]  Sourav S. Bhowmick,et al.  A survey of Web metrics , 2002, CSUR.

[4]  Yong Yu,et al.  An Approach for Semantic Search by Matching RDF Graphs , 2002, FLAIRS.

[5]  Eleni Stroulia,et al.  Semantic Structure Matching for Assessing Web-Service Similarity , 2003, ICSOC.

[6]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[7]  Matthias Klusch,et al.  Automated semantic web service discovery with OWLS-MX , 2006, AAMAS '06.

[8]  Kalina Bontcheva,et al.  Ontology-Based Information Extraction for Business Intelligence , 2007, ISWC/ASWC.

[9]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[10]  Ming Mao,et al.  Ontology Mapping: An Information Retrieval and Interactive Activation Network Based Approach , 2007, ISWC/ASWC.

[11]  Michalis Vazirgiannis,et al.  Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri , 2007, IJCAI.

[12]  Pablo Castells,et al.  A Multi-Purpose Ontology-Based Approach for Personalized Content Filtering and Retrieval , 2006, 2006 First International Workshop on Semantic Media Adaptation and Personalization (SMAP'06).

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.