Information Retrieval by Semantic Similarity

Semantic Similarity relates to computing the similarity between conceptually similar but not necessarily lexically similar terms. Typically, semantic similarity is computed by mapping terms to an ontology and by examining their relationships in that ontology. We investigate approaches to computing the semantic similarity between natural language terms (using WordNet as the underlying reference ontology) and between medical terms (using the MeSH ontology of medical and biomedical terms). The most popular semantic similarity methods are implemented and evaluated using WordNet and MeSH. Building upon semantic similarity, we propose the Semantic Similarity based Retrieval Model (SSRM), a novel information retrieval method capable for discovering similarities between documents containing conceptually similar terms. The most effective semantic similarity method is implemented into SSRM. SSRM has been applied in retrieval on OHSUMED (a standard TREC collection available on the Web). The experimental results demonstrated promising performance improvements over classic information retrieval methods utilizing plain lexical matching (e.g., Vector Space Model) and also over state-of-the-art semantic similarity retrieval methods utilizing ontologies.

[1]  Euripides G. M. Petrakis,et al.  Weighted link analysis for logo and trademark image retrieval on the Web , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[2]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[3]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[4]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[6]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[7]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[8]  Alan F. Smeaton,et al.  Using WordNet in a Knowledge-Based Approach to Information Retrieval , 1995 .

[9]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[10]  Marco La Cascia,et al.  Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine , 1997 .

[11]  Wagner Meira,et al.  Set-based vector model: An efficient approach for correlation-based ranking , 2005, TOIS.

[12]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[13]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[14]  Clement T. Yu,et al.  Evaluating strategies and systems for content based indexing of person images on the Web , 2000, ACM Multimedia.

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[16]  A. Tversky Features of Similarity , 1977 .

[17]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[18]  Euripides G. M. Petrakis,et al.  Relevance feedback methods for logo and trademark image retrieval on the web , 2006, SAC.

[19]  C. Fellbaum An Electronic Lexical Database , 1998 .

[20]  Beng Chin Ooi,et al.  Giving meanings to WWW images , 2000, MM 2000.

[21]  Euripides G. M. Petrakis,et al.  IntelliSearch: Intelligent Search for Images and Text on the Web , 2006, ICIAR.

[22]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[23]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[24]  Dekang Lin,et al.  Principle-Based Parsing Without Overgeneration , 1993, ACL.

[25]  Djemel Ziou,et al.  Image Retrieval from the World Wide Web: Issues, Techniques, and Systems , 2004, CSUR.

[26]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[27]  Max J. Egenhofer,et al.  Determining Semantic Similarity among Entity Classes from Different Ontologies , 2003, IEEE Trans. Knowl. Data Eng..

[28]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[29]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[30]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[31]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[32]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[33]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[34]  Euripides G. M. Petrakis,et al.  Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[35]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[36]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[37]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[38]  Euripides G. M. Petrakis,et al.  MedSearch: A Retrieval System for Medical Information Based on Semantic Similarity , 2006, ECDL.

[39]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[40]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  John Murphy,et al.  Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words , 1994 .