论文信息 - Kernel Descriptors for Visual Recognition

Kernel Descriptors for Visual Recognition

The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.

[1] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2] David Haussler,et al. Convolution kernels on discrete structures , 1999 .

[3] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Tony Jebara,et al. A Kernel Between Sets of Vectors , 2003, ICML.

[5] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7] Trevor Darrell,et al. The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8] Cordelia Schmid,et al. A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Siwei Lyu,et al. Mercer kernels for object recognition with local features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Yihong Gong,et al. Deep Learning with Kernel Regularization for Visual Recognition , 2008, NIPS.

[14] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[15] Eli Shechtman,et al. In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[19] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[22] Cristian Sminchisescu,et al. Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[23] Sebastian Nowozin,et al. On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24] James M. Rehg,et al. Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27] Lorenzo Rosasco,et al. Publisher Accessed Terms of Use Detailed Terms Mathematics of the Neural Response , 2022 .

[28] Geoffrey E. Hinton,et al. Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.