Boosting image retrieval through aggregating search results based on visual annotations

Online photo sharing systems, such as Flickr and Picasa, provide a valuable source of human-annotated photos. Textual annotations are used not only to describe the visual content of an image, but also subjective, spatial, temporal and social dimensions, complicating the task of keyword-based search. In this paper we investigate a method that exploits visual annotations, e.g. notes in Flickr, to enhance keyword-based systems retrieval performance. For this purpose we adopt the bag-of-visual-words approach for content-based image retrieval as our baseline. We then apply rank aggregation of the top 25 results obtained with a set of visual annotations that match the keyword-based query. The results on retrieval experiments show significant improvements in retrieval performance when comparing the aggregated approach with our baseline, which also slightly outperforms text-only search. When using a textual filter on the search space in combination with the aggregated approach an additional boost in retrieval performance is observed, which underlines the need for large scale content-based image retrieval techniques to complement the text-based search.

[1]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[2]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[3]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[4]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[5]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Rainer Lienhart,et al.  PLSA on Large Scale Image Databases , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[9]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[10]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Gabriela Csurka,et al.  Categorization in multiple category systems , 2006, ICML.

[12]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[13]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[14]  Rong Yan,et al.  How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[17]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[18]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[19]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[21]  Roelof van Zwol,et al.  Diversifying image search with user generated content , 2008, MIR '08.

[22]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.