An Adaptive Approach to Learning Optimal Neighborhood Kernels

Learning an optimal kernel plays a pivotal role in kernel-based methods. Recently, an approach called optimal neighborhood kernel learning (ONKL) has been proposed, showing promising classification performance. It assumes that the optimal kernel will reside in the neighborhood of a “pre-specified” kernel. Nevertheless, how to specify such a kernel in a principled way remains unclear. To solve this issue, this paper treats the pre-specified kernel as an extra variable and jointly learns it with the optimal neighborhood kernel and the structure parameters of support vector machines. To avoid trivial solutions, we constrain the pre-specified kernel with a parameterized model. We first discuss the characteristics of our approach and in particular highlight its adaptivity. After that, two instantiations are demonstrated by modeling the pre-specified kernel as a common Gaussian radial basis function kernel and a linear combination of a set of base kernels in the way of multiple kernel learning (MKL), respectively. We show that the optimization in our approach is a min-max problem and can be efficiently solved by employing the extended level method and Nesterov's method. Also, we give the probabilistic interpretation for our approach and apply it to explain the existing kernel learning methods, providing another perspective for their commonness and differences. Comprehensive experimental results on 13 UCI data sets and another two real-world data sets show that via the joint learning process, our approach not only adaptively identifies the pre-specified kernel, but also achieves superior classification performance to the original ONKL and the related MKL algorithms.

[1]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[3]  Xizhao Wang,et al.  Fast Fuzzy Multicategory SVM Based on Support Vector Domain Description , 2008, Int. J. Pattern Recognit. Artif. Intell..

[4]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[5]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[6]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[7]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[8]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[9]  Christoph H. Lampert Kernel Methods in Computer Vision , 2009, Found. Trends Comput. Graph. Vis..

[10]  Xi-Zhao Wang,et al.  Improving Generalization of Fuzzy IF--THEN Rules by Maximizing Fuzzy Entropy , 2009, IEEE Transactions on Fuzzy Systems.

[11]  A. K. Ghosh,et al.  Kernel Discriminant Analysis Using Case-Specific Smoothing Parameters , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Jing Li,et al.  Machine Learning Approaches for the Neuroimaging Study of Alzheimer's Disease , 2011, Computer.

[13]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Jianping Yin,et al.  Incorporation of radius-info can be simple with SimpleMKL , 2012, Neurocomputing.

[15]  Yiqiang Chen,et al.  Building Sparse Multiple-Kernel SVM Classifiers , 2009, IEEE Transactions on Neural Networks.

[16]  L. Li,et al.  Learning Similarity With Multikernel Method , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Jieping Ye,et al.  Learning the Optimal Neighborhood Kernel for Classification , 2009, IJCAI.

[18]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[19]  Matthew Stewart,et al.  IEEE Transactions on Cybernetics , 2015, IEEE Transactions on Cybernetics.

[20]  Xizhao Wang,et al.  Induction of multiple fuzzy decision trees based on rough set technique , 2008, Inf. Sci..

[21]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[22]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[23]  C. L. Philip Chen,et al.  Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery , 2011, Int. J. Mach. Learn. Cybern..

[24]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[25]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[26]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[27]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[28]  Daniel S. Yeung,et al.  A genetic algorithm for solving the inverse problem of support vector machines , 2005, Neurocomputing.

[29]  Jing Li,et al.  Heterogeneous data fusion for alzheimer's disease study , 2008, KDD.

[30]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Chiranjib Bhattacharyya,et al.  Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..

[32]  Zhi-Hua Zhou,et al.  Non-Parametric Kernel Learning with robust pairwise constraints , 2012, Int. J. Mach. Learn. Cybern..

[33]  Chiranjib Bhattacharyya,et al.  Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions , 2010, NIPS.

[34]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[35]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[36]  Congxin Wu,et al.  Separating theorem of samples in Banach space for support vector machine learning , 2011, Int. J. Mach. Learn. Cybern..

[37]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[38]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[39]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Zenglin Xu,et al.  Efficient Sparse Generalized Multiple Kernel Learning , 2011, IEEE Transactions on Neural Networks.

[41]  C. L. Philip Chen,et al.  A Multiple-Kernel Fuzzy C-Means Algorithm for Image Segmentation , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.