Multi-image aggregation for better visual object retrieval

We study how aggregating multiple images, on query or database side, impacts the performance of visual object retrieval in a Bag-of-Words framework. To this end, we first compare five different multi-image aggregation methods, and suggest selecting the average pooling method in most cases for its superior advantages in accuracy, speed, and memory footprint. Then we prove with experiments that more images generally yield better retrieval performance. What is more, we illustrate that simply aggregating query images without selection is far from optimal. Comprehensive experiments were conducted on three large-scale object retrieval datasets, and the new state-of the-art was achieved. This research can be leveraged in some real applications such as mobile search, where the retrieval performance will be improved once users snap multiple query images.

[1]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[4]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[6]  Brian Antonishek TRECVID 2010 – An Introduction to the Goals , Tasks , Data , Evaluation Mechanisms , and Metrics , 2010 .

[7]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Shin'ichi Satoh,et al.  Large vocabulary quantization for searching instances from videos , 2012, ICMR '12.

[10]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[12]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.