Learning With Kernels: A Local Rademacher Complexity-Based Analysis With Application to Graph Kernels

When dealing with kernel methods, one has to decide which kernel and which values for the hyperparameters to use. Resampling techniques can address this issue but these procedures are time-consuming. This problem is particularly challenging when dealing with structured data, in particular with graphs, since several kernels for graph data have been proposed in literature, but no clear relationship among them in terms of learning properties is defined. In these cases, exhaustive search seems to be the only reasonable approach. Recently, the global Rademacher complexity (RC) and local Rademacher complexity (LRC), two powerful measures of the complexity of a hypothesis space, have shown to be suited for studying kernels properties. In particular, the LRC is able to bound the generalization error of an hypothesis chosen in a space by disregarding those ones which will not be taken into account by any learning procedure because of their high error. In this paper, we show a new approach to efficiently bound the RC of the space induced by a kernel, since its exact computation is an NP-Hard problem. Then we show for the first time that RC can be used to estimate the accuracy and expressivity of different graph kernels under different parameter configurations. The authors’ claims are supported by experimental results on several real-world graph data sets.

[1]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[2]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[4]  Massimiliano Pontil,et al.  Structured Sparsity and Generalization , 2011, J. Mach. Learn. Res..

[5]  Alexander Binder,et al.  Theory and Algorithms for the Localized Setting of Learning Kernels , 2015, FE@NIPS.

[6]  S. V. N. Vishwanathan,et al.  Fast Computation of Graph Kernels , 2006, NIPS.

[7]  Alessandro Sperduti,et al.  A memory efficient graph kernel , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[8]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[9]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[10]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[11]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[12]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[13]  Karthik Sridharan,et al.  Learning with Square Loss: Localization through Offset Rademacher Complexity , 2015, COLT.

[14]  Gilles Blanchard,et al.  The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning , 2011, NIPS.

[15]  Shahar Mendelson,et al.  On the Performance of Kernel Classes , 2003, J. Mach. Learn. Res..

[16]  M. Boyd,et al.  New soluble-formazan assay for HIV-1 cytopathic effects: application to high-flux screening of synthetic and natural products for AIDS-antiviral activity. , 1989, Journal of the National Cancer Institute.

[17]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[18]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[19]  Fabrizio Costa,et al.  Fast Neighborhood Subgraph Pairwise Distance Kernel , 2010, ICML.

[20]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[21]  Alessandro Sperduti,et al.  A Tree-Based Kernel for Graphs , 2012, SDM.

[22]  M. Talagrand The Glivenko-Cantelli Problem , 1987 .

[23]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[24]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[25]  Marius Kloft,et al.  Learning Kernels Using Local Rademacher Complexity , 2013, NIPS.

[26]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[27]  François Laviolette,et al.  Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm , 2015, J. Mach. Learn. Res..

[28]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[29]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[30]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[31]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[32]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[33]  Alessandro Sperduti,et al.  Exploiting the ODD framework to define a novel effective graph kernel , 2015, ESANN.

[34]  Verónica Bolón-Canedo,et al.  Feature and kernel learning , 2015, ESANN.

[35]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[36]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[37]  Davide Anguita,et al.  The 'K' in K-fold Cross Validation , 2012, ESANN.

[38]  Davide Anguita,et al.  A Deep Connection Between the Vapnik–Chervonenkis Entropy and the Rademacher Complexity , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[39]  Davide Anguita,et al.  Global Rademacher Complexity Bounds: From Slow to Fast Convergence Rates , 2015, Neural Processing Letters.

[40]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[41]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[42]  Davide Anguita,et al.  Local Rademacher Complexity: Sharper risk bounds with and without unlabeled samples , 2015, Neural Networks.

[43]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .