Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition

This paper proposes a new bag-of-visual phrase (BoP) approach for mobile landmark recognition based on discriminative learning of category-dependent visual phrases. Many previous landmark recognition works adopt a bag-of-words (BoW) method which ignores the co-occurrence relationship between neighboring visual words in an image. Although some works that focus on visual phrase learning have appeared, they mainly construct a generalized phrase dictionary from all categories for recognition, which lacks descriptive capability for a specific category. Another shortcoming of these works is the hard assignment of numerous feature sets to a limited number of phrases, which causes some useful feature sets to be discarded, and yields information loss. In view of this, this paper presents a discriminative soft BoP approach for mobile landmark recognition. The candidate phrases defined as adjacent pairwise codewords are first generated for each category. The important candidates are then selected through a proposed discriminative visual phrase (DVP) selection approach to form the BoP dictionary. Finally, a soft encoding method is developed to quantize each image into a BoP histogram. The context information such as location and direction captured by mobile devices is also integrated with the proposed BoP-based content analysis for landmark recognition. Experimental results on two datasets show that the proposed method is effective in mobile landmark recognition.

[1]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[2]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Luc Van Gool,et al.  Object Recognition for the Internet of Things , 2008, IOT.

[4]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[6]  Tao Chen,et al.  Discriminative bag-of-visual phrase learning for landmark recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  A. Robert Calderbank,et al.  Bayesian Analysis of Interference Cancellation for Alamouti Multiplexing , 2008, IEEE Transactions on Information Theory.

[8]  Tao Chen,et al.  Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  J.-b. Ryu,et al.  Formula for Harris corner detector , 2011 .

[14]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[16]  Chung-Hsien Wu,et al.  Extraction of robust visual phrases using graph mining for image retrieval , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[17]  Kenneth J. Hintz,et al.  A measure of the information gain attributable to cueing , 1991, IEEE Trans. Syst. Man Cybern..

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[20]  Aristidis Likas,et al.  The global kernel k-means clustering algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[23]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[24]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[26]  Qi Tian,et al.  Multiple instance learning using visual phrases for object classification , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[27]  Wen Gao,et al.  Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[31]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32]  S Kullback,et al.  LETTER TO THE EDITOR: THE KULLBACK-LEIBLER DISTANCE , 1987 .

[33]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[34]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.