Kernel learning and optimization with Hilbert–Schmidt independence criterion

Measures of statistical dependence between random variables have been successfully applied in many machine learning tasks, such as independent component analysis, feature selection, clustering and dimensionality reduction. The success is based on the fact that many existing learning tasks can be cast into problems of dependence maximization (or minimization). Motivated by this, we present a unifying view of kernel learning via statistical dependence estimation. The key idea is that good kernels should maximize the statistical dependence between the kernels and the class labels. The dependence is measured by the Hilbert–Schmidt independence criterion (HSIC), which is based on computing the Hilbert–Schmidt norm of the cross-covariance operator of mapped samples in the corresponding Hilbert spaces and is traditionally used to measure the statistical dependence between random variables. As a special case of kernel learning, we propose a Gaussian kernel optimization method for classification by maximizing the HSIC, where two forms of Gaussian kernels (spherical kernel and ellipsoidal kernel) are considered. Extensive experiments on real-world data sets from UCI benchmark repository validate the superiority of the proposed approach in terms of both prediction accuracy and computational efficiency.

[1]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[2]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[3]  Houkuan Huang,et al.  Learning by local kernel polarization , 2009, Neurocomputing.

[4]  Liang Tao,et al.  Learning shared subspace for multi-label dimensionality reduction via dependence maximization , 2015, Neurocomputing.

[5]  Dongyan Zhao,et al.  Two-stage multiple kernel learning with multiclass kernel polarization , 2013, Knowl. Based Syst..

[6]  Jieping Ye,et al.  Learning subspace kernels for classification , 2008, KDD.

[7]  Jing-Yu Yang,et al.  Multiple kernel clustering based on centered kernel alignment , 2014, Pattern Recognit..

[8]  LinChih-Jen,et al.  A tutorial on -support vector machines , 2005 .

[9]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[10]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[11]  Haitao Xu,et al.  Multiple rank multi-linear kernel support vector machine for matrix data classification , 2018, Int. J. Mach. Learn. Cybern..

[12]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[13]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[14]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[16]  Simon Haykin,et al.  On Different Facets of Regularization Theory , 2002, Neural Computation.

[17]  Bernhard Schölkopf,et al.  Kernel Constrained Covariance for Dependence Measurement , 2005, AISTATS.

[18]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  Arthur Gretton,et al.  A Kernel Independence Test for Random Processes , 2014, ICML.

[21]  Masashi Sugiyama,et al.  On Kernel Parameter Selection in Hilbert-Schmidt Independence Criterion , 2012, IEICE Trans. Inf. Syst..

[22]  S. Sathiya Keerthi,et al.  Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms , 2002, IEEE Trans. Neural Networks.

[23]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[24]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[25]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Ivor W. Tsang,et al.  Incorporating the Loss Function Into Discriminative Clustering of Structured Outputs , 2010, IEEE Transactions on Neural Networks.

[28]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[29]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[30]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[31]  Yoram Baram,et al.  Learning by Kernel Polarization , 2005, Neural Computation.

[32]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[33]  Dongyan Zhao,et al.  An overview of kernel alignment and its applications , 2012, Artificial Intelligence Review.

[34]  Tu Bao Ho,et al.  An efficient kernel matrix evaluation measure , 2008, Pattern Recognit..

[35]  Bernhard Schölkopf,et al.  Remote Sensing Feature Selection by Kernel Dependence Measures , 2010, IEEE Geoscience and Remote Sensing Letters.

[36]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[37]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[38]  Binbin Pan,et al.  A Novel Framework for Learning Geometry-Aware Kernels , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Yong Liu,et al.  Learning kernels with upper bounds of leave-one-out error , 2011, CIKM '11.

[40]  Zhi-Hua Zhou,et al.  Non-Parametric Kernel Learning with robust pairwise constraints , 2012, Int. J. Mach. Learn. Cybern..

[41]  Peng Liu,et al.  Two-stage extreme learning machine for high-dimensional data , 2016, Int. J. Mach. Learn. Cybern..

[42]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[43]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..