On the Convergence Rate ofp-Norm Multiple Kernel Learning

We derive an upper bound on the local Rademacher complexity ofp-norm multiple kernel learn- ing, which yields a tighter excess risk bound than global approaches. Previous local approaches analyzed the case p = 1 only while our analysis covers all cases 1 � p � ¥, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order O(n − a 1+a), where a is the minimum eigenvalue decay rate of the

[1]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[2]  Shie Mannor,et al.  Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Taiji Suzuki,et al.  Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning , 2011, NIPS.

[4]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[5]  Chiranjib Bhattacharyya,et al.  Variable Sparsity Kernel Learning , 2011, J. Mach. Learn. Res..

[6]  A. Gelman Causality and Statistical Learning1 , 2010, American Journal of Sociology.

[7]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[8]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[9]  Ian H. Witten,et al.  WEKA - Experiences with a Java Open-Source Project , 2010, J. Mach. Learn. Res..

[10]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[11]  Sebastian Nowozin,et al.  Let the kernel figure it out; Principled learning of pre-processing for kernel classifiers , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[13]  Corinna Cortes,et al.  Invited talk: Can learning kernels help performance? , 2009, International Conference on Machine Learning.

[14]  Colin Campbell,et al.  Generalization Bounds for Learning the Kernel Problem , 2009, COLT.

[15]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[16]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.

[17]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[18]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[19]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[20]  V. Koltchinskii Rejoinder: Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0135.

[21]  Shai Ben-David,et al.  Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.

[22]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[23]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[24]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[25]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[26]  J. Steele The Cauchy–Schwarz Master Class: References , 2004 .

[27]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[28]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[29]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[32]  Rustam Ibragimov,et al.  The best constant in the Rosenthal inequality for nonnegative random variables , 2001 .

[33]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[34]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[35]  S. Geer Empirical Processes in M-Estimation , 2000 .

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[38]  J. Kahane Some Random Series of Functions , 1985 .

[39]  John R. Searle,et al.  Minds, brains, and programs , 1980, Behavioral and Brain Sciences.

[40]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[41]  H. Rosenthal On the subspaces ofLp(p>2) spanned by sequences of independent random variables , 1970 .