Rapid image retrieval for mobile location recognition

Recognizing the location and orientation of a mobile device from captured images is a promising application of image retrieval algorithms. Matching the query images to an existing georeferenced database like Google Street View enables mobile search for location related media, products, and services. Due to the rapidly changing field of view of the mobile device caused by constantly changing user attention, very low retrieval times are essential. These can be significantly reduced by performing the feature quantization on the handheld and transferring compressed Bag-of-Feature vectors to the server. To cope with the limited processing capabilities of handhelds, the quantization of high dimensional feature descriptors has to be performed at very low complexity. To this end, we introduce in this paper the novel Multiple Hypothesis Vocabulary Tree (MHVT) as a step towards real-time mobile location recognition. The MHVT increases the probability of assigning matching feature descriptors to the same visual word by introducing an overlapping buffer around the separating hyperplanes to allow for a soft quantization and an adaptive clustering approach. Further, a novel framework is introduced that allows us to integrate the probability of correct quantization in the distance calculation using an inverted file scheme. Our experiments demonstrate that our approach achieves query times reduced by up to a factor of 10 when compared to the state-of-the-art.

[1]  Bernd Girod,et al.  Unified Real-Time Tracking and Recognition with Rotation-Invariant Fast Features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[4]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[5]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[7]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[8]  Bernd Girod,et al.  Compression of image patches for local feature extraction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[10]  AnguelovDragomir,et al.  Google Street View , 2010 .

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).