Composite kernel learning

The Support Vector Machine (SVM) is an acknowledged powerful tool for building classifiers, but it lacks flexibility, in the sense that the kernel is chosen prior to learning. Multiple Kernel Learning (MKL) enables to learn the kernel, from an ensemble of basis kernels, whose combination is optimized in the learning process. Here, we propose Composite Kernel Learning to address the situation where distinct components give rise to a group structure among kernels. Our formulation of the learning problem encompasses several setups, putting more or less emphasis on the group structure. We characterize the convexity of the learning problem, and provide a general wrapper algorithm for computing solutions. Finally, we illustrate the behavior of our method on multi-channel data where groups correpond to channels.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[3]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[4]  Klaus-Robert Müller,et al.  The BCI competition 2003: progress and perspectives in detection and discrimination of EEG single trials , 2004, IEEE Transactions on Biomedical Engineering.

[5]  Alain Rakotomamonjy,et al.  BCI Competition III: Dataset II- Ensemble of SVMs for BCI P300 Speller , 2008, IEEE Transactions on Biomedical Engineering.

[6]  Pierre Morizet-Mahoudeaux,et al.  Hierarchical Penalization , 2007, NIPS.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[9]  Bruno Torrésani,et al.  Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients , 2009, Signal Image Video Process..

[10]  Mila Nikolova,et al.  Local Strong Homogeneity of a Regularized Estimator , 2000, SIAM J. Appl. Math..

[11]  Yves Grandvalet,et al.  Outcomes of the Equivalence of Adaptive Ridge with Least Absolute Shrinkage , 1998, NIPS.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[14]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[15]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[16]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[17]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[18]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[19]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Bernhard Schölkopf,et al.  Robust EEG Channel Selection across Subjects for Brain-Computer Interfaces , 2005, EURASIP J. Adv. Signal Process..

[22]  Alain Rakotomamonjy,et al.  Ensemble of SVMs for Improving Brain Computer Interface P300 Speller Performances , 2005, ICANN.

[23]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[24]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.

[25]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[26]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[27]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[28]  E. Donchin,et al.  Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. , 1988, Electroencephalography and clinical neurophysiology.

[29]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[30]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[31]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[32]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[33]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[34]  Ricardo Chavarriaga,et al.  Fast Recognition of Anticipation-Related Potentials , 2009, IEEE Transactions on Biomedical Engineering.

[35]  Nello Cristianini,et al.  Dynamically Adapting Kernels in Support Vector Machines , 1998, NIPS.

[36]  Shai Ben-David,et al.  Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.

[37]  W. Walter,et al.  Contingent Negative Variation : An Electric Sign of Sensori-Motor Association and Expectancy in the Human Brain , 1964, Nature.

[38]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.