Encoding Classes of Unaligned Objects Using Structural Similarity Cross-Covariance Tensors

Encoding an object essence in terms of self-similarities between its parts is becoming a popular strategy in Computer Vision. In this paper, a new similarity-based descriptor, dubbed Structural Similarity Cross-Covariance Tensor is proposed, aimed to encode relations among different regions of an image in terms of cross-covariance matrices. The latter are calculated between low-level feature vectors extracted from pairs of regions. The new descriptor retains the advantages of the widely used covariance matrix descriptors [1], extending their expressiveness from local similarities inside a region to structural similarities across multiple regions. The new descriptor, applied on top of HOG, is tested on object and scene classification tasks with three datasets. The proposed method always outclasses baseline HOG and yields significant improvement over a recently proposed self-similarity descriptor in the two most challenging datasets.

[1]  Nebojsa Jojic,et al.  Spring Lattice Counting Grids: Scene Recognition Using Deformable Positional Constraints , 2012, ECCV.

[2]  Vittorio Murino,et al.  Characterizing Humans on Riemannian Manifolds , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[7]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[8]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Vittorio Murino,et al.  Joining feature-based and similarity-based pattern description paradigms for object detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  Vittorio Murino,et al.  Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.