论文信息 - Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition

Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition

This paper proposes a new bag-of-visual phrase (BoP) approach for mobile landmark recognition based on discriminative learning of category-dependent visual phrases. Many previous landmark recognition works adopt a bag-of-words (BoW) method which ignores the co-occurrence relationship between neighboring visual words in an image. Although some works that focus on visual phrase learning have appeared, they mainly construct a generalized phrase dictionary from all categories for recognition, which lacks descriptive capability for a specific category. Another shortcoming of these works is the hard assignment of numerous feature sets to a limited number of phrases, which causes some useful feature sets to be discarded, and yields information loss. In view of this, this paper presents a discriminative soft BoP approach for mobile landmark recognition. The candidate phrases defined as adjacent pairwise codewords are first generated for each category. The important candidates are then selected through a proposed discriminative visual phrase (DVP) selection approach to form the BoP dictionary. Finally, a soft encoding method is developed to quantize each image into a BoP histogram. The context information such as location and direction captured by mobile devices is also integrated with the proposed BoP-based content analysis for landmark recognition. Experimental results on two datasets show that the proposed method is effective in mobile landmark recognition.

[1] Anas Al-Nuaimi,et al. Mobile Visual Location Recognition , 2013 .

[2] Tao Mei,et al. Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[3] Luc Van Gool,et al. Object Recognition for the Internet of Things , 2008, IOT.

[4] Qi Tian,et al. Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Gang Hua,et al. Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[6] Tao Chen,et al. Discriminative bag-of-visual phrase learning for landmark recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] A. Robert Calderbank,et al. Bayesian Analysis of Interference Cancellation for Alamouti Multiplexing , 2008, IEEE Transactions on Information Theory.

[8] Tao Chen,et al. Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[9] Cor J. Veenman,et al. Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Michel Barlaud,et al. Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12] Richard Szeliski,et al. City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13] J.-b. Ryu,et al. Formula for Harris corner detector , 2011 .

[14] Alexei A. Efros,et al. IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[16] Chung-Hsien Wu,et al. Extraction of robust visual phrases using graph mining for image retrieval , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[17] Kenneth J. Hintz,et al. A measure of the information gain attributable to cueing , 1991, IEEE Trans. Syst. Man Cybern..

[18] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19] Bingbing Ni,et al. Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[20] Aristidis Likas,et al. The global kernel k-means clustering algorithm , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[21] Andrew Zisserman,et al. Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Jiawei Han,et al. Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[23] Gang Hua,et al. Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[24] Ming Yang,et al. Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Leonidas J. Guibas,et al. The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[26] Qi Tian,et al. Multiple instance learning using visual phrases for object classification , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[27] Wen Gao,et al. Effective and efficient object-based image retrieval using visual phrases , 2006, MM '06.

[28] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29] Yang Song,et al. Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Zhen Li,et al. A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[31] Antonio Torralba,et al. Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[32] S Kullback,et al. LETTER TO THE EDITOR: THE KULLBACK-LEIBLER DISTANCE , 1987 .

[33] Jiri Matas,et al. Learning a Fine Vocabulary , 2010, ECCV.

[34] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.