Large-Scale Content-Based Sub-Image Search

In this work the problems of specific object and image retrieval including the more challenging sub-image are studied. Given a query image of a specific object a retrieval engine returns relevant images of the same object from a database. The thesis focuses on the bag-of-words approach which is one of the most effective content-based approach especially when the specific object covers only a part of the picture, can be occluded or only partially visible. The thesis improves a number of components of the standard bag-of-words retrieval approach. A novel similarity measure for bag-of-words type large scale image retrieval is presented. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2based soft assignment and Hamming embedding. The novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard datasets and protocols. We study the effect of a fine quantization and very large vocabularies (up to 64 million words) and show that the performance of specific object retrieval increases with the size of the vocabulary. This observation is in contradiction with previously published results. We further demonstrate that the large vocabularies increase the speed of the tf-idf scoring step. All state-of-the-art image retrieval results in the literature have been achieved by methods that include a query expansion which brings a significant boost in performance. We introduce three modifications to automatic query expansion: (i) a method capable of preventing query expansion failure caused by the presence of confusers, (ii) an improved spatial verification and re-ranking step that incrementally builds a statistical model of the query object and (iii) we learn relevant spatial context to boost retrieval performance. All three improvements of query expansion were evaluated on standard Paris and Oxford datasets and state-of-the-art results were achieved. Finally, novel problems for image retrieval are formulated. It is shown that the classical ranking of images based on similarity addresses only one of possible user requirements. Instead of searching for the most similar images, the novel retrieval methods zoom-in and zoom-out answer the “What is this?” and “Where is this?” questions. In addition, two other task are formulated: (i) given a query and a large image dataset, for every pixel location in the query, find an image with maximum resolution and (ii) return the frequency with which a pixel appears in the dataset. The zoom-in and zoom-out required the development of two novel techniques: the hierarchical query expansion method and a geometric consistency verification step that is sufficiently robust to prevent a topic drift within a zooming search. Experiments show that the proposed methods find surprisingly fine details on the tested landmarks, even those that are hardly noticeable for humans.

[1]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[2]  Xi Li,et al.  Visual Distance Measures for Object Retrieval , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[3]  Xi Li,et al.  Ranking consistency for image matching and object retrieval , 2014, Pattern Recognit..

[4]  Siyuan Qi,et al.  Relevance of useful visual words in object retrieval , 2013, Other Conferences.

[5]  Guang-Zhong Yang,et al.  From images to scenes: Compressing an image cluster into a single scene model for place recognition , 2011, 2011 International Conference on Computer Vision.

[6]  Julien Pilet,et al.  Size Matters: Exhaustive Geometric Verification for Image Retrieval Accepted for ECCV 2012 , 2012, ECCV.

[7]  Yongdong Zhang,et al.  Visual stem mapping and Geometric Tense coding for Augmented Visual Vocabulary , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jiri Matas,et al.  Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[14]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[15]  Qingming Huang,et al.  Weighted visual vocabulary to balance the descriptive ability on general dataset , 2013, Neurocomputing.

[16]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[17]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[18]  Luke J. Gosink,et al.  Coherent image layout using an adaptive visual vocabulary , 2013, Electronic Imaging.

[19]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[20]  Bastian Leibe,et al.  Discovering Details and Scene Structure with Hierarchical Iconoid Shift , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Oliver Bittel,et al.  Obstacle and Game Element Detection with the 3D-Sensor Kinect , 2011, Eurobot Conference.

[22]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation by Image Exploration , 2004, ECCV.

[23]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[25]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[26]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[27]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[28]  Laurent Amsaleg,et al.  Balancing clusters to reduce response time variability in large scale image search , 2010, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[29]  Leonidas J. Guibas,et al.  Select informative features for recognition , 2011, 2011 18th IEEE International Conference on Image Processing.

[30]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[31]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[33]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Andrew Zisserman,et al.  Smooth object retrieval using a bag of boundaries , 2011, 2011 International Conference on Computer Vision.

[36]  Anton van den Hengel,et al.  Boosting Object Retrieval With Group Queries , 2012, IEEE Signal Processing Letters.

[37]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[38]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[40]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[41]  Yubin Kuang,et al.  Optimizing Visual Vocabularies Using Soft Assignment Entropies , 2010, ACCV.

[42]  Jiri Matas,et al.  Improving Descriptors for Fast Tree Matching by Optimal Linear Projection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[43]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[45]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[46]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[49]  Anton van den Hengel,et al.  Spatially aware feature selection and weighting for object retrieval , 2013, Image Vis. Comput..

[50]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Tomás Pajdla,et al.  Visual localization by linear combination of image descriptors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[53]  Jie Yang,et al.  An efficient indexing method for content-based image retrieval , 2013, Neurocomputing.

[54]  Bastian Leibe,et al.  Discovering favorite views of popular places with iconoid shift , 2011, 2011 International Conference on Computer Vision.

[55]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[56]  Jiri Matas,et al.  Learning Vocabularies over a Fine Quantization , 2013, International Journal of Computer Vision.

[57]  Friedrich Fraundorfer,et al.  A Binning Scheme for Fast Hard Drive Based Image Search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[59]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  David Obdrzálek,et al.  Detecting Scene Elements Using Maximally Stable Colour Regions , 2009, Eurobot Conference.

[61]  Lifeng Sun,et al.  Find where you are: a new try in place recognition , 2013, The Visual Computer.

[62]  Yubin Kuang,et al.  Supervised feature quantization with entropy optimization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[63]  Cordelia Schmid,et al.  A contextual dissimilarity measure for accurate and efficient image search , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Michal Perdoch,et al.  Efficient sequential correspondence selection by cosegmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[66]  Andrej Mikulík,et al.  Logion - A Robot Which Collects Rocks , 2008 .

[67]  Andrew Zisserman,et al.  Get Out of my Picture! Internet-based Inpainting , 2009, BMVC.

[68]  Gang Hua,et al.  Picking the best DAISY , 2009, CVPR.

[69]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[70]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[71]  Bernd Girod,et al.  Linking the virtual and physical worlds ) , 2011 .

[72]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[73]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[74]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[75]  Jiri Matas,et al.  Image Retrieval for Online Browsing in Large Image Collections , 2013, SISAP.