Learning to combine multi-resolution spatially-weighted co-occurrence matrices for image representation

Bag-of-Words is widely used to describe images for image classification. However, this approach is limited because the spatial relation over visual words is not well exploited and also it is difficult to generate a single comprehensive vocabulary. In this paper, we propose novel effective schemes to handle these two issues. First, we propose a structure propagation technique to build more reasonable co-occurrence matrices of visual words to exploit the spatial information, which assigns a higher weight to the co-occurrence over two patches that lie in the same object part. Second, we build the multiple-histogram representation over hierarchical vocabularies to avoid the ambiguity of single vocabulary, and particularly present a learning approach to combine the multiple histograms to integrate both within-vocabulary and cross-vocabulary information. We evaluate our proposed method using the Princeton sports event dataset. Compared to the state-of-the-art results, our proposed approach has shown promising results.

[1]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Bingbing Ni,et al.  Contextualizing histogram , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[8]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, CVPR Workshops.

[9]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Tao Wang,et al.  One step beyond histograms: Image representation using Markov stationary features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[12]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[14]  James M. Rehg,et al.  Where am I: Place instance and category recognition using spatial PACT , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[16]  Stefano Soatto,et al.  Proximity Distribution Kernels for Geometric Context in Category Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.