Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System

This paper brings two contributions in relation with the semantic heterogeneous (documents composed of texts and images) information retrieval: (1) A new context-based semantic distance measure for textual data, and (2) an IR system providing a conceptual and an automatic indexing of documents by considering their heterogeneous content using a domain specific ontology. The proposed semantic distance measure is used in order to automatically fuzzify our domain ontology. The two proposals are evaluated and very interesting results were obtained. Using our semantic distance measure, we obtained a correlation ratio of 0.89 with human judgments on a set of words pairs which led our measure to outperform all the other measures. Preliminary combination results obtained on a specialized corpus of web pages are also reported.

[1]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[2]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[3]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[4]  Jiankang Wang,et al.  Adaptive balloon models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[5]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[6]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[7]  Tom Minka,et al.  Vision texture for annotation , 1995, Multimedia Systems.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Adil Alpkocak,et al.  Semantic image retrieval and auto-annotation by converting keyword space to image space , 2006, 2006 12th International Multi-Media Modelling Conference.

[10]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[11]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[14]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[15]  John Yen,et al.  A fuzzy ontology-based abstract search engine and its user studies , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[16]  David Parry,et al.  A fuzzy ontology for medical document retrieval , 2004, ACSW.

[17]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[18]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Hakim Hacid,et al.  A Multisource Context-Dependent Semantic Distance Between Concepts , 2007, DEXA.

[20]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[21]  Hakim Hacid Neighborhood Graphs for Semi-automatic Annotation of Large Image Databases , 2007, MMM.

[22]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[23]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[24]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[26]  Luc De Raedt,et al.  Machine Learning: ECML 2001 , 2001, Lecture Notes in Computer Science.

[27]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[28]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[29]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[30]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[31]  Ted Briscoe,et al.  32nd Annual Meeting of the Association for Computational Linguistics, 27-30 June 1994, New Mexico State University, Las Cruces, New Mexico, USA, Proceedings , 1994, ACL.

[32]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[33]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[34]  A. Tversky Features of Similarity , 1977 .