A novel supervised approach to learning efficient kernel descriptors for high accuracy object recognition

Discriminative patch-level features are essential for achieving good performance in many computer vision tasks. Recently, unsupervised learning approaches have been employed to design such features based on the similarities of image patches. These approaches, such as kernel descriptors (KD) and efficient kernel descriptors (EKD), have shown superior performance than pre-defined image features (e.g., SIFT or HoG) in object recognition. They gave a kernel generalization of orientation histograms and suggested a promising way to 'grow-up' features based on available information.A major limitation of these approaches is patch similarities are not directly linked to object categories. Therefore, a supervised approach to learning patch-level features that takes into account image class labels is in urgent need. In this paper, we achieve this goal by proposing supervised efficient kernel descriptors (SEKD), in which incomplete Cholesky decomposition is performed jointly with image class label in feature learning. Experimental results on several well-known image classification benchmarks suggest that SEKDs are more compact and have superior discriminative power than previous unsupervised feature descriptors.

[1]  Hervé Le Borgne,et al.  Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Hongbin Zha,et al.  Supervised Kernel Descriptors for Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[4]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  T FreemanWilliam,et al.  80 Million Tiny Images , 2008 .

[7]  Alexei A. Efros,et al.  Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Bingbing Ni,et al.  Geometric ℓp-norm feature pooling for image classification , 2011, CVPR 2011.

[10]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[11]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[12]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[13]  Jian Yu,et al.  Efficient kernel descriptor for image categorization via pivots selection , 2013, 2013 IEEE International Conference on Image Processing.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Luis Herranz,et al.  Joint multi-feature spatial context for scene recognition in the semantic manifold , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[18]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[19]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[20]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[21]  Dieter Fox,et al.  Object recognition with hierarchical kernel descriptors , 2011, CVPR 2011.

[22]  Matti Pietikäinen,et al.  Learning Discriminant Face Descriptor , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[25]  Matti Pietikäinen,et al.  RLBP: Robust Local Binary Pattern , 2013, BMVC.

[26]  Matti Pietikäinen,et al.  Automatic Dynamic Texture Segmentation Using Local Descriptors and Optical Flow , 2013, IEEE Transactions on Image Processing.

[27]  Jian Yu,et al.  A boosting approach to learning receptive fields for scene categorization , 2013, 2013 IEEE International Conference on Image Processing.

[28]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[29]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Tong Zhang,et al.  Improved Local Coordinate Coding using Local Tangents , 2010, ICML.

[32]  Jian Yu,et al.  Efficient image representation for object recognition via pivots selection , 2014, Frontiers of Computer Science.

[33]  Guoying Zhao,et al.  BRINT: Binary Rotation Invariant and Noise Tolerant Texture Classification , 2014, IEEE Transactions on Image Processing.

[34]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[35]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[41]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[43]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[44]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[45]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.