Efficient Additive Kernels via Explicit Feature Maps

Large scale nonlinear support vector machines (SVMs) can be approximated by linear ones using a suitable feature map. The linear SVMs are in general much faster to learn and evaluate (test) than the original nonlinear SVMs. This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems. In particular, we: 1) provide explicit feature maps for all additive homogeneous kernels along with closed form expression for all common kernels; 2) derive corresponding approximate finite-dimensional feature maps based on a spectral analysis; and 3) quantify the error of the approximation, showing that the error is independent of the data dimension and decays exponentially fast with the approximation order for selected kernels such as χ2. We demonstrate that the approximations have indistinguishable performance from the full kernels yet greatly reduce the train/test times of SVMs. We also compare with two other approximation methods: Nystrom's approximation of Perronnin et al., which is data dependent, and the explicit map of Maji and Berg for the intersection kernel, which, as in the case of our approximations, is data independent. The approximations are evaluated on a number of standard data sets, including Caltech-101, Daimler-Chrysler pedestrians, and INRIA pedestrians.

[1]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Christopher K. I. Williams,et al.  The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[3]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[4]  Bernhard Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, International Conference on Machine Learning.

[5]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[6]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[9]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Francesca Odone,et al.  Histogram intersection kernel for image classification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[11]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[13]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[14]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[16]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[19]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  J. Weston,et al.  Support Vector Machine Solvers , 2007 .

[22]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[23]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[24]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[27]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[28]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[31]  Mark Everingham,et al.  Implicit color segmentation features for pedestrian and object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Cristian Sminchisescu,et al.  Efficient Match Kernel between Sets of Features for Visual Recognition , 2009, NIPS.

[33]  Shuicheng Yan,et al.  An HOG-LBP human detector with partial occlusion handling , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[34]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Yoram Singer,et al.  Efficient Learning using Forward-Backward Splitting , 2009, NIPS.

[36]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[38]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[40]  C. V. Jawahar,et al.  Generalized RBF feature maps for Efficient Detection , 2010, BMVC.

[41]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.