Location Discriminative Vocabulary Coding for Mobile Landmark Search

With the popularization of mobile devices, recent years have witnessed an emerging potential for mobile landmark search. In this scenario, the user experience heavily depends on the efficiency of query transmission over a wireless link. As sending a query photo is time consuming, recent works have proposed to extract compact visual descriptors directly on the mobile end towards low bit rate transmission. Typically, these descriptors are extracted based solely on the visual content of a query, and the location cues from the mobile end are rarely exploited. In this paper, we present a Location Discriminative Vocabulary Coding (LDVC) scheme, which achieves extremely low bit rate query transmission, discriminative landmark description, as well as scalable descriptor delivery in a unified framework. Our first contribution is a compact and location discriminative visual landmark descriptor, which is offline learnt in two-step: First, we adopt spectral clustering to segment a city map into distinct geographical regions, where both visual and geographical similarities are fused to optimize the partition of city-scale geo-tagged photos. Second, we propose to learn LDVC in each region with two schemes: (1) a Ranking Sensitive PCA and (2) a Ranking Sensitive Vocabulary Boosting. Both schemes embed location cues to learn a compact descriptor, which minimizes the retrieval ranking loss by replacing the original high-dimensional signatures. Our second contribution is a location aware online vocabulary adaption: We store a single vocabulary in the mobile end, which is efficiently adapted for a region specific LDVC coding once a mobile device enters a given region. The learnt LDVC landmark descriptor is extremely compact (typically 10–50 bits with arithmetical coding) and performs superior over state-of-the-art descriptors. We implemented the framework in a real-world mobile landmark search prototype, which is validated in a million-scale landmark database covering typical areas e.g. Beijing, New York City, Lhasa, Singapore, and Florence.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[5]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[6]  Luc Van Gool,et al.  HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[9]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[10]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[12]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[14]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Trevor Darrell,et al.  Adaptive Vocabulary Forests br Dynamic Indexing and Category Learning , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[19]  Jianxiong Xiao,et al.  Structuring Visual Words in 3D for Arbitrary-View Object Localization , 2008, ECCV.

[20]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[21]  Andrzej Sluzek,et al.  Image-Based Information Guide on Mobile Devices , 2008, ISVC.

[22]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xing Xie,et al.  Vocabulary tree incremental indexing for scalable location recognition , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Chuohao Yeo,et al.  Rate-efficient visual correspondences using random projections , 2008, 2008 15th IEEE International Conference on Image Processing.

[25]  Alessandro Perina,et al.  Geo-located image analysis using latent representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Tom Drummond,et al.  Unified Loop Closing and Recovery for Real Time Monocular SLAM , 2008, BMVC.

[28]  Horst Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, CVPR.

[29]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[30]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Bernd Girod,et al.  Compression of image patches for local feature extraction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[34]  Rongrong Ji,et al.  Vocabulary hierarchy optimization for effective and transferable retrieval , 2009, CVPR.

[35]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[36]  Bernd Girod,et al.  Transform coding of image feature descriptors , 2009, Electronic Imaging.

[37]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[38]  Xing Xie,et al.  Location sensitive indexing for image-based advertising , 2009, MM '09.

[39]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[40]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[42]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Bernd Girod,et al.  Inverted Index Compression for Scalable Image Matching , 2010, 2010 Data Compression Conference.

[44]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Bernd Girod,et al.  Comparison of local feature descriptors for mobile visual search , 2010, 2010 IEEE International Conference on Image Processing.

[46]  Yue Gao Probabilistic Principle Component Analysis on Time Lapse images , 2010 .

[47]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[49]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .