Discriminative bag-of-visual phrase learning for landmark recognition

Bag-of-visual phrase (BoP) has been proposed and developed for landmark recognition recently. However, existing BoP methods for landmark recognition have two major shortcomings: (i) they try to construct a universal phrase vocabulary for all object categories, which lacks specific descriptive capabilities for a particular category, and (ii) they often adopt simple criterion such as the frequency information to mine the visual phrases, which may cause the selected phrases to be less discriminative or representative for recognition. In view of this, this paper proposes a new discriminative BoP approach for landmark recognition. First, the candidate visual phrases defined as adjacent pairwise words are selected for each category. A phrase-level similarity measure at the latent space is proposed to evaluate the semantic similarity between pairwise phrases. This is then integrated with the phrase frequency information to shortlist the discriminative phrases for each category through a proposed phrase ranking algorithm. Finally, the BoP and bag-of-words (BoW) histograms are combined through a pyramid matching method for recognition. Experimental results on two different datasets demonstrate that the proposed method is effective in landmark recognition.

[1]  Tao Chen,et al.  Integrated Content and Context Analysis for Mobile Landmark Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[5]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[6]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhen Li,et al.  A Comparative Study of Mobile-Based Landmark Recognition Techniques , 2010, IEEE Intelligent Systems.

[8]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[9]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).