Tri-level Combination for Image Representation

The context of objects can provide auxiliary discrimination beyond objects. However, this effective information has not been fully explored. In this paper, we propose Tri-level Combination for Image Representation TriCoIR to solve the problem at three different levels: object intrinsic, strongly-related context and weakly-related context. Object intrinsic excludes external disturbances and more focuses on the objects themselves. Strongly-related context is cropped from the input image with a more loose bound to contain surrounding context. Weakly-related one is recovered from the image other than object for global context. First, strongly and weakly-related context are constructed from input images. Second, we make cascade transformations for more intrinsical object information, which depends on the consistency between generated global context and input images in the regions other than object. Finally, a joint representation is acquired based on these three level features. The experiments on two benchmark datasets prove the effectiveness of TriCoIR.

[1]  Luc Van Gool,et al.  Object and Action Classification with Latent Variables , 2011, BMVC.

[2]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[3]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[6]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[7]  Linda G. Shapiro,et al.  Unsupervised Template Learning for Fine-Grained Object Recognition , 2012, NIPS.

[8]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[11]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[13]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Dieter Fox,et al.  Depth kernel descriptors for object recognition , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Luc Van Gool,et al.  TriCoS: A Tri-level Class-Discriminative Co-segmentation Method for Image Classification , 2012, ECCV.

[16]  Dieter Fox,et al.  Kernel Descriptors for Visual Recognition , 2010, NIPS.

[17]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[18]  Qiang Chen,et al.  Hierarchical matching with side information for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Josef Kittler,et al.  Two-Stage Augmented Kernel Matrix for Object Recognition , 2011, MCS.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[24]  Fahad Shahbaz Khan,et al.  Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[25]  Josef Kittler,et al.  Augmented Kernel Matrix vs Classifier Fusion for Object Recognition , 2011, BMVC.

[26]  Hongping Cai,et al.  ℓp norm multiple kernel Fisher discriminant analysis for object and image categorisation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.