Web image retrieval on ImagEVAL: evidences on visualness and textualness concept dependency in fusion model

We present in this article an efficient visuo-textual Web Image Retrieval system (WIR), which is the second best system according to the official European ImagEVAL 2006 campaign evaluation. It uses very simple tfidf textual analysis, and subband entropy profile visual features. Our mean fusion model represents a simple but nearly state of the art WIR. We depict analyses of the fusion behavior of each query. We then demonstrate that "visualness" of images, and "textualness" of web page, relative to the discriminant power of each features, are concept dependant, and that fusion model could take advantage of their possible complementarity. We finally discuss on their automatic estimations that may enhance WIR.

[1]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Hervé Glotin,et al.  Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Thomas S. Huang,et al.  Unifying Keywords and Visual Contents in Image Retrieval , 2002, IEEE Multim..

[6]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[7]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Hervé Glotin,et al.  Shape reasoning on mis-segmented and mis-labeled objects using approximated Fisher criterion , 2006, Comput. Graph..

[9]  Hervé Glotin,et al.  LDA Versus MMD Approximation on Mislabeled Images for Dependant Selection of Visual Features and Their Heterogeneity , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Keiji Yanai,et al.  Image region entropy: a measure of "visualness" of web images associated with one concept , 2005, MULTIMEDIA '05.

[11]  Hervé Glotin,et al.  Enhancement of Textual Images Classification Using Segmented Visual Contents for Image Search Engine , 2005, Multimedia Tools and Applications.

[12]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[13]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[14]  S. Sclaroff,et al.  ImageRover: a content-based image browser for the World Wide Web , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[15]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..