Context-Aware Discriminative Vocabulary Learning for Mobile Landmark Recognition

This paper proposes a discriminative vocabulary learning for landmark recognition based on the context information acquired from mobile devices. The vocabulary learning generates a set of discriminative codewords for image representation, which is important for landmark recognition. Many state-of-the-art methods use content analysis alone for vocabulary learning, which underutilizes the context information provided by mobile devices, such as location from the GPS positioner and direction from the digital compass. Although some works start to consider the images' location information for vocabulary learning, the location alone is insufficient since GPS data has significant errors in dense built-up areas. The context analysis techniques that use GPS to shortlist the geographically nearby landmark candidates for subsequent image matching are at times inadequate. In view of this, the paper proposes to employ both direction and location information to learn a discriminative compact vocabulary (DCV) for mobile landmark recognition. Direction information is first considered to supervise image feature clustering to construct direction-dependent scalable vocabulary trees (DSVTs). Location information is then incorporated into the proposed DCV learning algorithm, to select the discriminative codewords of the DSVT to form the DCV. An ImageRank technique and an iterative codeword selection algorithm are developed for DCV learning. Experimental results using the NTU50Landmark database show that the proposed approach achieves 4% improvement over the current method in mobile landmark recognition.

[1]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[2]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xing Xie,et al.  Location sensitive indexing for image-based advertising , 2009, MM '09.

[4]  Lei Wang Toward A Discriminative Codebook: Codeword Selection across Multi-resolution , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[7]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[8]  Bernd Girod,et al.  Location coding for mobile image retrieval , 2009, MobiMedia.

[9]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[11]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[14]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[15]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Chun Chen,et al.  Discriminative codeword selection for image representation , 2010, ACM Multimedia.

[17]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[18]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[19]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[20]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.

[22]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrzej Sluzek,et al.  Image-Based Information Guide on Mobile Devices , 2008, ISVC.

[24]  Lei Wang,et al.  A Fast Algorithm for Creating a Compact and Discriminative Visual Codebook , 2008, ECCV.

[25]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[26]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bernd Girod,et al.  Tree Histogram Coding for Mobile Image Matching , 2009, 2009 Data Compression Conference.

[28]  Qi Tian,et al.  Latent visual context learning for web image applications , 2011, Pattern Recognit..

[29]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yang Song,et al.  Tour the world: a technical demonstration of a web-scale landmark recognition engine , 2009, ACM Multimedia.

[31]  Bernd Girod,et al.  Compressed Histogram of Gradients: A Low-Bitrate Descriptor , 2011, International Journal of Computer Vision.

[32]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[34]  Tao Chen,et al.  Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.