Birds of a feather flock together: Visual representation with scale and class consistency

Abstract There are three problems with a local-feature based representation scheme. First, local regions are often densely extracted or determined through detection without considering the scales of local regions. Second, local features are encoded separately, leaving the relationship among them unconsidered. Third, local features are simply encoded without considering the class information. To solve these problems, in this paper, we propose a scale and class consistent local-feature encoding method for image representation, which is achieved through the dense extraction of local features in different scale spaces, and the subsequent learning of the encoding parameters. In addition, instead of encoding each local feature independently, we jointly optimize the encoding parameters of the local features. Moreover, we also impose class consistency during the local-feature encoding process. We test the discriminative power of image representations on image classification tasks. Experiments on several public image datasets demonstrate that the proposed method achieves a superior performance compared with many other local-feature based methods.

[1]  Jean Ponce,et al.  Learning Discriminative Part Detectors for Image Classification and Cosegmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Qingming Huang,et al.  Image classification by non-negative sparse coding, correlation constrained low-rank and sparse decomposition , 2014, Comput. Vis. Image Underst..

[4]  Qi Tian,et al.  Incremental Codebook Adaptation for Visual Representation and Categorization , 2018, IEEE Transactions on Cybernetics.

[5]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Qi Tian,et al.  Fine-Grained Image Classification via Low-Rank Sparse Coding With General and Class-Specific Codebooks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Qi Tian,et al.  Image Class Prediction by Joint Object, Context, and Background Modeling , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Qi Tian,et al.  Image classification by non-negative sparse coding, low-rank and sparse decomposition , 2011, CVPR 2011.

[10]  Qi Tian,et al.  Structured Weak Semantic Space Construction for Visual Categorization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Qi Tian,et al.  Contextual Exemplar Classifier-Based Image Representation for Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[12]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[15]  Ilja Kuzborskij,et al.  When Naïve Bayes Nearest Neighbors Meet Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[17]  Qi Tian,et al.  Image-Specific Classification With Local and Global Discriminations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[20]  Xiao Liu,et al.  Recognizing architecture styles by hierarchical sparse coding of blocklets , 2014, Inf. Sci..

[21]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Yi Yang,et al.  Compact and Discriminative Descriptor Inference Using Multi-Cues , 2015, IEEE Transactions on Image Processing.

[23]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[24]  Qi Tian,et al.  Bundled Local Features for Image Representation , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Qi Tian,et al.  Image classification by search with explicitly and implicitly semantic representations , 2017, Inf. Sci..

[27]  Qi Tian,et al.  Image-level classification by hierarchical structure learning with visual and semantic similarities , 2018, Inf. Sci..

[28]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[29]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Qi Tian,et al.  Boosted random contextual semantic space based representation for visual recognition , 2016, Inf. Sci..

[34]  Qi Tian,et al.  Image classification using boosted local features with random orientation and location selection , 2015, Inf. Sci..

[35]  Graham W. Taylor,et al.  Deconvolutional networks , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[37]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[38]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Chun Chen,et al.  Image-based facial sketch-to-photo synthesis via online coupled dictionary learning , 2012, Inf. Sci..

[40]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[41]  Qi Tian,et al.  Beyond Explicit Codebook Generation: Visual Representation Using Implicitly Transferred Codebooks , 2015, IEEE Transactions on Image Processing.

[42]  Baoxin Li,et al.  Discriminative affine sparse codes for image classification , 2011, CVPR 2011.

[43]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[44]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Wenyu Liu,et al.  Feature context for image classification and object detection , 2011, CVPR 2011.

[46]  Xuelong Li,et al.  Image Classification With Densely Sampled Image Windows and Generalized Adaptive Multiple Kernel Learning , 2015, IEEE Transactions on Cybernetics.

[47]  Qi Tian,et al.  Multiview Label Sharing for Visual Representations and Classifications , 2018, IEEE Transactions on Multimedia.