Multimodal indexing based on semantic cohesion for image retrieval

This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TC12 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experi- mental results show that the proposed representations outperform significantly both, stan- dard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we

[1]  Joo-Hwee Lim,et al.  IPAL Inter-Media Pseudo-Relevance Feedback Approach to ImageCLEF 2006 Photo Retrieval , 2006, CLEF.

[2]  Samy Bengio,et al.  A Neural Network to Retrieve Images from Text Queries , 2006, ICANN.

[3]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[4]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[5]  Miguel Ángel García Cumbreras,et al.  SINAI at ImageCLEF 2005 , 2005, CLEF.

[6]  Gabriela Csurka,et al.  Crossing textual and visual content in different application scenarios , 2009, Multimedia Tools and Applications.

[7]  Hugo Jair Escalante,et al.  An energy-based model for region-labeling , 2011, Comput. Vis. Image Underst..

[8]  Ishwar K. Sethi,et al.  Synobins: An Intermediate Level towards Annotation and Semantic Retrieval , 2006, EURASIP J. Adv. Signal Process..

[9]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[10]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[11]  Aurelio López-López,et al.  Combining Text Vector Representations for Information Retrieval , 2009, TSD.

[12]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[13]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[14]  David Elworthy Retrieval from captioned image databases using natural language processing , 2000, CIKM '00.

[15]  Julio Gonzalo,et al.  UNED at ImageCLEF 2005: Automatically Structured Queries with Named Entities over Metadata , 2005, CLEF.

[16]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[17]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[18]  Gabriela Csurka,et al.  XRCE's Participation in ImageCLEF 2009 , 2009, CLEF.

[19]  José Luis Martínez-Fernández,et al.  Exploiting Semantic Features for Image Retrieval at CLEF 2005 , 2005, CLEF.

[20]  Tat-Seng Chua,et al.  A concept-based image retrieval system , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[21]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[22]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[23]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[24]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[25]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.

[26]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[27]  Hugo Jair Escalante,et al.  On Multimedia Image Retrieval Baselines , 2009 .

[28]  Mirna Adriani,et al.  University of Indonesia Participation at WEBIR-CLEF 2005 , 2005 .

[29]  Andy Way,et al.  Dublin City University at CLEF 2004: Experiments with the ImageCLEF St. Andrew's Collection , 2004, CLEF.

[30]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[31]  Naonori Ueda,et al.  Retrieving lightly annotated images using image similarities , 2005, SAC '05.

[32]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[34]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[35]  Hsin-Hsi Chen,et al.  Approaches Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval , 2006, CLEF.

[36]  Luca Dini,et al.  CELI Participation at ImageCLEF 2006: Comparison with the Ad-hoc Track , 2006, CLEF.

[37]  Jakob J. Verbeek,et al.  Ranking User-annotated Images for Multiple Query Terms , 2009, BMVC.

[38]  José Luis Vicedo González,et al.  University of Alicante in ImageCLEF2005 , 2005, CLEF.

[39]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[40]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[41]  Abby Goodrum,et al.  Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[42]  Djoerd Hiemstra,et al.  Interactive Content-Based Retrieval Using Pre-computed Object-Object Similarities , 2004, CIVR.

[43]  Gabriela Csurka,et al.  XRCE's Participation to ImageCLEFphoto 2007 , 2007, CLEF.

[44]  James Allan,et al.  Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes , 2009, SIGIR.

[45]  Hugo Jair Escalante,et al.  Late fusion of heterogeneous methods for multimedia image retrieval , 2008, MIR '08.

[46]  José Luis Martínez-Fernández,et al.  MIRACLE Team Report for ImageCLEF IR in CLEF 2006 , 2006, CLEF.

[47]  Michael Grubinger,et al.  Analysis and evaluation of visual information systems performance , 2007 .

[48]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[49]  Thomas Deselaers,et al.  Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks , 2006, CLEF.

[50]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[51]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[52]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Fabrizio Sebastiani,et al.  Distributional term representations: an experimental comparison , 2004, CIKM '04.

[54]  Anthony Hoogs,et al.  Evaluation of Localized Semantics: Data, Methodology, and Experiments , 2008, International Journal of Computer Vision.

[55]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[56]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[57]  Djoerd Hiemstra,et al.  Building Detectors to Support Searches on Combined Semantic Concepts , 2007 .

[58]  Mirna Adriani,et al.  The University of Indonesia's Participation in IMAGE-CLEF 2005 , 2005, CLEF.

[59]  Mika Rautiainen,et al.  Comparison of Visual Features and Fusion Techniques in Automatic Detection of Concepts from News Video , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[60]  Djoerd Hiemstra,et al.  Reusing annotation labor for concept selection , 2009, CIVR '09.

[61]  Mark Sanderson,et al.  Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009 , 2009, CLEF.

[62]  Martha Larson,et al.  Overview of VideoCLEF 2009: New Perspectives on Speech-based Multimedia Content Enrichment , 2009, CLEF.

[63]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[64]  Romaric Besançon,et al.  Using Text and Image Retrieval Systems. Lic2m experiments at ImageCLEF 2006 , 2006, CLEF.

[65]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Chen Zhang,et al.  User term feedback in interactive text-based image retrieval , 2005, SIGIR '05.

[67]  Manuel Montes-y-Gómez,et al.  Combining Word and Phonetic-Code Representations for Spoken Document Retrieval , 2011, CICLing.

[68]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[69]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[70]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[71]  Michael R. Lyu,et al.  CUHK Experiments with ImageCLEF 2005 , 2005, CLEF.

[72]  Timo Ojala,et al.  Analysing the performance of visual, concept and text features in content-based video retrieval , 2004, MIR '04.

[73]  Bipin C. Desai,et al.  CINDI at ImageCLEF 2006: Image Retrieval & Annotation Tasks for the General Photographic and Medical Image Collections , 2006, CLEF.