Socio-mobile landmark recognition using local features with adaptive region selection

With the fast development of mobile devices as well as the broadband wireless network, mobile devices are playing a more and more important role in people's daily life. Nowadays, many landmark images are captured by mobile devices. However, these images are often captured under different lightening conditions with varied poses and camera orientations. Besides, people are inherently connected by personal interests as well as various interactions. To alleviate the imaging problem with mobile devices as well as take advantage of the social information for mobile visual applications, we propose a novel socio-mobile visual recognition method using local features with adaptive region selection. We densely extract local regions and use the pixel gradients to represent each local region. Each local region is divided into 4 x 4 subregions to combine the spatial information. Instead of using fixed pixel numbers for each subregion, we adaptively choose the proper size of each subregion to cope with varied poses and camera orientations. The most discriminative local features are then chosen by minimizing the sparse coding loss. Besides, a geo-discriminative codebook is also generated to take advantages of images' location information. Moreover, we jointly consider the visual distances as well as user's friends' matching results to further boost the final visual recognition performance. We achieve the state-of-the-art performance on the Stanford mobile visual search dataset and the San Francisco landmark dataset. These experimental results demonstrate the effectiveness and efficiency of the proposed adaptive region selection based local features for sodo-mobile landmark recognition. (C) 2015 Elsevier B.V. All rights reserved.

[1]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[2]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[3]  Yi Yang,et al.  Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding , 2012, IEEE Transactions on Image Processing.

[4]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[5]  Shih-Fu Chang,et al.  Lost in binarization: query-adaptive ranking for similar image search with compact codes , 2011, ICMR '11.

[6]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, CVPR.

[7]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[9]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[10]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[12]  Dong Xu,et al.  Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jure Leskovec,et al.  Image Labeling on a Network: Using Social-Network Metadata for Image Classification , 2012, ECCV.

[14]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[15]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Meng Wang,et al.  Detecting Group Activities With Multi-Camera Context , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Huizhong Chen,et al.  The stanford mobile visual search data set , 2011, MMSys.

[19]  Vanja Josifovski,et al.  Web-scale user modeling for targeting , 2012, WWW.

[20]  Alberto Del Bimbo,et al.  Exploiting distinctive visual landmark maps in pan-tilt-zoom camera networks , 2010, Comput. Vis. Image Underst..

[21]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[22]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[23]  Ting Li,et al.  Locality Sensitive Hashing , 2016 .

[24]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[25]  Rong Yan,et al.  Mining Social Emotions from Affective Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[26]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[27]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[28]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[29]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[31]  Christian Eitzinger,et al.  Optimizing Feature Calculation in Adaptive Machine Vision Systems , 2012 .

[32]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Shipeng Li,et al.  Query-driven iterated neighborhood graph search for large scale indexing , 2012, ACM Multimedia.

[34]  Nicu Sebe,et al.  Real Time Detection of Social Interactions in Surveillance Video , 2012, ECCV Workshops.

[35]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[36]  Luc Van Gool,et al.  On-line Adaption of Class-specific Codebooks for Instance Tracking , 2010, BMVC.

[37]  Changsheng Xu,et al.  Enhanced 3-D Modeling for Landmark Image Classification , 2012, IEEE Transactions on Multimedia.

[38]  Hao Xu,et al.  Tag refinement by regularized LDA , 2009, ACM Multimedia.

[39]  K. Selçuk Candan,et al.  SCENT: Scalable compressed monitoring of evolving multirelational social networks , 2011, TOMCCAP.

[40]  Nish Parikh,et al.  Rewriting null e-commerce queries to recommend products , 2012, WWW.

[41]  Luc Van Gool,et al.  Affine/ Photometric Invariants for Planar Intensity Patterns , 1996, ECCV.

[42]  Baoxin Li,et al.  Discriminative affine sparse codes for image classification , 2011, CVPR 2011.

[43]  Jing Wang,et al.  Scalable similar image search by joint indices , 2012, ACM Multimedia.

[44]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[45]  Dong Liu,et al.  Hybrid social media network , 2012, ACM Multimedia.

[46]  Kai Wang,et al.  End-to-end scene text recognition , 2011, 2011 International Conference on Computer Vision.

[47]  Edwin Lughofer,et al.  Increasing Classification Robustness with Adaptive Features , 2008, ICVS.

[48]  Loren Terveen,et al.  Beyond Recommender Systems: Helping People Help Each Other , 2001 .

[49]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[50]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[51]  Meng Wang,et al.  Product Aspect Ranking and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[52]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[53]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[54]  Bernd Girod,et al.  CHoG: Compressed histogram of gradients A low bit-rate feature descriptor , 2009, CVPR.