Multimodal indexing based on semantic cohesion for image retrieval

This paper introduces two novel strategies for representing multimodal images with application to multimedia image retrieval. We consider images that are composed of both text and labels: while text describes the image content at a very high semantic level (e.g., making reference to places, dates or events), labels provide a mid-level description of the image (i.e., in terms of the objects that can be seen in the image). Accordingly, the main assumption of this work is that by combining information from text and labels we can develop very effective retrieval methods. We study standard information fusion techniques for combining both sources of information. However, whereas the performance of such techniques is highly competitive, they cannot capture effectively the content of images. Therefore, we propose two novel representations for multimodal images that attempt to exploit the semantic cohesion among terms from different modalities. Such representations are based on distributional term representations widely used in computational linguistics. Under the considered representations the content of an image is modeled by a distribution of co-occurrences over terms or of occurrences over other images, in such a way that the representation can be considered an expansion of the multimodal terms in the image. We report experimental results using the SAIAPR TC12 benchmark on two sets of topics used in ImageCLEF competitions with manually and automatically generated labels. Experimental results show that the proposed representations outperform significantly both, standard multimodal techniques and unimodal methods. Results on manually assigned labels provide an upper bound in the retrieval performance that can be obtained, whereas results with automatically generated labels are encouraging. The novel representations are able to capture more effectively the content of multimodal images. We emphasize that although we have applied our representations to multimedia image retrieval the same formulation can be adopted for modeling other multimodal documents (e.g., videos).

[1]  A. Hanbury Review of Image Annotation for the Evaluation of Computer Vision Algorithms , 2006 .

[2]  Aurelio López-López,et al.  Combining Text Vector Representations for Information Retrieval , 2009, TSD.

[3]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[4]  David Elworthy Retrieval from captioned image databases using natural language processing , 2000, CIKM '00.

[5]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[6]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[7]  Michael R. Lyu,et al.  CUHK Experiments with ImageCLEF 2005 , 2005, CLEF.

[8]  Andy Way,et al.  Dublin City University at CLEF 2004: Experiments with the ImageCLEF St. Andrew's Collection , 2004, CLEF.

[9]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[10]  Djoerd Hiemstra,et al.  Building Detectors to Support Searches on Combined Semantic Concepts , 2007 .

[11]  Timo Ojala,et al.  Analysing the performance of visual, concept and text features in content-based video retrieval , 2004, MIR '04.

[12]  Mirna Adriani,et al.  The University of Indonesia's Participation in IMAGE-CLEF 2005 , 2005, CLEF.

[13]  Hsinchun Chen,et al.  Automatic Thesaurus Generation for an Electronic Community System , 1995, J. Am. Soc. Inf. Sci..

[14]  Hsin-Hsi Chen,et al.  Combining Text and Image Queries at ImageCLEF2005 , 2005, CLEF.

[15]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[16]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[17]  Manuel Montes-y-Gómez,et al.  Combining Word and Phonetic-Code Representations for Spoken Document Retrieval , 2011, CICLing.

[18]  Allan Hanbury,et al.  A survey of methods for image annotation , 2008, J. Vis. Lang. Comput..

[19]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[20]  Hsin-Hsi Chen,et al.  Approaches Using a Word-Image Ontology and an Annotated Image Corpus as Intermedia for Cross-Language Image Retrieval , 2006, CLEF.

[21]  Allan Hanbury,et al.  Overview of the ImageCLEFphoto 2007 Photographic Retrieval Task , 2008, CLEF.

[22]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[23]  Gabriela Csurka,et al.  XRCE's Participation in ImageCLEF 2009 , 2009, CLEF.

[24]  Marco La Cascia,et al.  Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web , 1999, Comput. Vis. Image Underst..

[25]  Kieran McDonald,et al.  Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew's Collection , 2004, CLEF.

[26]  Miguel Ángel García Cumbreras,et al.  SINAI at ImageCLEF 2005 , 2005, CLEF.

[27]  Tat-Seng Chua,et al.  A concept-based image retrieval system , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[28]  Luca Dini,et al.  CELI Participation at ImageCLEF 2006: Comparison with the Ad-hoc Track , 2006, CLEF.

[29]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[30]  Hugo Jair Escalante,et al.  An energy-based model for region-labeling , 2011, Comput. Vis. Image Underst..

[31]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[32]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[33]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[34]  Gabriela Csurka,et al.  Crossing textual and visual content in different application scenarios , 2009, Multimedia Tools and Applications.

[35]  Bipin C. Desai,et al.  CINDI at ImageCLEF 2006: Image Retrieval & Annotation Tasks for the General Photographic and Medical Image Collections , 2006, CLEF.

[36]  Ishwar K. Sethi,et al.  Synobins: An Intermediate Level towards Annotation and Semantic Retrieval , 2006, EURASIP J. Adv. Signal Process..

[37]  José Luis Vicedo González,et al.  University of Alicante in ImageCLEF2005 , 2005, CLEF.

[38]  Naonori Ueda,et al.  Retrieving lightly annotated images using image similarities , 2005, SAC '05.

[39]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[40]  Jakob J. Verbeek,et al.  Ranking User-annotated Images for Multiple Query Terms , 2009, BMVC.

[41]  Julio Gonzalo,et al.  UNED at ImageCLEF 2005: Automatically Structured Queries with Named Entities over Metadata , 2005, CLEF.

[42]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[43]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[44]  Hugo Jair Escalante,et al.  Towards Annotation-Based Query and Document Expansion for Image Retrieval , 2008, CLEF.

[45]  Hugo Jair Escalante,et al.  On Multimedia Image Retrieval Baselines , 2009 .

[46]  Djoerd Hiemstra,et al.  Extracting Bimodal Representations for Language-Based Image Retrieval , 1999, Eurographics Multimedia Workshop.

[47]  Mirna Adriani,et al.  University of Indonesia Participation at WEBIR-CLEF 2005 , 2005 .

[48]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[49]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[50]  José Luis Martínez-Fernández,et al.  Exploiting Semantic Features for Image Retrieval at CLEF 2005 , 2005, CLEF.

[51]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[52]  Naphtali Rishe,et al.  Content-based image retrieval , 1995, Multimedia Tools and Applications.

[53]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[54]  Christos Faloutsos,et al.  MindReader: Querying Databases Through Multiple Examples , 1998, VLDB.

[55]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[56]  Abby Goodrum,et al.  Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[57]  Djoerd Hiemstra,et al.  Interactive Content-Based Retrieval Using Pre-computed Object-Object Similarities , 2004, CIVR.

[58]  James Allan,et al.  Agreement among statistical significance tests for information retrieval evaluation at varying sample sizes , 2009, SIGIR.

[59]  Hugo Jair Escalante,et al.  Late fusion of heterogeneous methods for multimedia image retrieval , 2008, MIR '08.

[60]  José Luis Martínez-Fernández,et al.  MIRACLE Team Report for ImageCLEF IR in CLEF 2006 , 2006, CLEF.

[61]  Gareth J. F. Jones,et al.  Dublin City University at CLEF 2005: Multilingual Merging Experiments , 2005, CLEF.

[62]  Michael Grubinger,et al.  Analysis and evaluation of visual information systems performance , 2007 .

[63]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[64]  Mika Rautiainen,et al.  Comparison of Visual Features and Fusion Techniques in Automatic Detection of Concepts from News Video , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[65]  Djoerd Hiemstra,et al.  Reusing annotation labor for concept selection , 2009, CIVR '09.

[66]  Hugo Jair Escalante,et al.  Annotation-Based Expansion and Late Fusion of Mixed Methods for Multimedia Image Retrieval , 2008, CLEF.

[67]  Mark Sanderson,et al.  Diversity in Photo Retrieval: Overview of the ImageCLEFPhoto Task 2009 , 2009, CLEF.

[68]  Gosse Bouma,et al.  ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL , 2008 .

[69]  Samy Bengio,et al.  A Neural Network to Retrieve Images from Text Queries , 2006, ICANN.

[70]  Joo-Hwee Lim,et al.  IPAL Inter-Media Pseudo-Relevance Feedback Approach to ImageCLEF 2006 Photo Retrieval , 2006, CLEF.

[71]  Joo-Hwee Lim,et al.  Inter-media Pseudo-relevance Feedback Application to ImageCLEF 2006 Photo Retrieval , 2006, CLEF.

[72]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.

[73]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[74]  Thomas Deselaers,et al.  Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks , 2006, CLEF.

[75]  Fabrizio Sebastiani,et al.  Distributional term representations: an experimental comparison , 2004, CIKM '04.

[76]  Jan C. van Gemert,et al.  Retrieving Images as Text , 2003 .

[77]  Anthony Hoogs,et al.  Evaluation of Localized Semantics: Data, Methodology, and Experiments , 2008, International Journal of Computer Vision.

[78]  Martha Larson,et al.  Overview of VideoCLEF 2009: New Perspectives on Speech-based Multimedia Content Enrichment , 2009, CLEF.

[79]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[80]  Romaric Besançon,et al.  Using Text and Image Retrieval Systems. Lic2m experiments at ImageCLEF 2006 , 2006, CLEF.

[81]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Chen Zhang,et al.  User term feedback in interactive text-based image retrieval , 2005, SIGIR '05.

[83]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[84]  Ben Bradshaw,et al.  Semantic based image retrieval: a probabilistic approach , 2000, ACM Multimedia.

[85]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[86]  Zhi-Hua Zhou,et al.  Enhancing relevance feedback in image retrieval using unlabeled data , 2006, ACM Trans. Inf. Syst..

[87]  Gabriela Csurka,et al.  XRCE's Participation to ImageCLEFphoto 2007 , 2007, CLEF.

[88]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.