论文信息 - Local Deep Kernel Learning for Efficient Non-linear SVM Prediction

Local Deep Kernel Learning for Efficient Non-linear SVM Prediction

Our objective is to speed up non-linear SVM prediction while maintaining classification accuracy above an acceptable limit. We generalize Localized Multiple Kernel Learning so as to learn a tree-based primal feature embedding which is high dimensional and sparse. Primal based classification decouples prediction costs from the number of support vectors and our tree-structured features efficiently encode non-linearities while speeding up prediction exponentially over the state-of-the-art. We develop routines for optimizing over the space of tree-structured features and efficiently scale to problems with more than half a million training points. Experiments on benchmark data sets reveal that our formulation can reduce prediction costs by more than three orders of magnitude in some cases with a moderate sacrifice in classification accuracy as compared to RBF-SVMs. Furthermore, our formulation leads to better classification accuracies over leading methods.

Prasoon Goyal | Manik Varma | Cijo Jose | Parv Aggrwal

[1] Klaus-Robert Müller,et al. Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[2] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[3] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[4] Andrew Zisserman,et al. Sparse kernel approximations for efficient classification and detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[6] Jieping Ye,et al. Learning subspace kernels for classification , 2008, KDD.

[7] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[8] Yoshua Bengio,et al. DECISION TREES DO NOT GENERALIZE TO NEW VARIATIONS , 2010, Comput. Intell..

[9] Chiranjib Bhattacharyya,et al. Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..

[10] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11] Ethem Alpaydin,et al. Localized multiple kernel learning , 2008, ICML '08.

[12] S. V. N. Vishwanathan,et al. Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[13] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.

[14] Subhransu Maji,et al. Efficient Classification for Additive Kernel SVMs , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Barbara Caputo,et al. Online-batch strongly convex Multi Kernel Learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Alexander J. Smola,et al. Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[17] Gunnar Rätsch,et al. Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[18] Bernhard Schölkopf,et al. Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[19] Trevor Darrell,et al. Learning with Recursive Perceptual Representations , 2012, NIPS.

[20] Francesco Orabona,et al. Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning , 2011, ICML.

[21] Ivor W. Tsang,et al. Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[22] Thorsten Joachims,et al. Sparse kernel SVMs via cutting-plane training , 2009, Machine-mediated learning.

[23] Cristian Sminchisescu,et al. Fourier Kernel Learning , 2012, ECCV.

[24] Rong Yan,et al. Adaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction , 2011, ICML.

[25] S. Sathiya Keerthi,et al. Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[26] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[27] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[28] Philip H. S. Torr,et al. Locally Linear Support Vector Machines , 2011, ICML.

[29] Jieping Ye,et al. Multi-class Discriminant Kernel Learning via Convex Programming , 2008, J. Mach. Learn. Res..

[30] S. V. N. Vishwanathan,et al. SPF-GMKL: generalized multiple kernel learning with a million kernels , 2012, KDD.

[31] Francis R. Bach,et al. Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[32] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[33] Mehryar Mohri,et al. Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[34] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.

[35] Ivor W. Tsang,et al. Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[36] Vikas Sindhwani,et al. Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels , 2011, NIPS.

[37] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[38] Mehryar Mohri,et al. L2 Regularization for Learning Kernels , 2009, UAI.