Efficient and Accurate Lp-Norm Multiple Kernel Learning

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, l1-norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary lp-norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply lp-norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art.

[1]  Kenneth O. Kortanek,et al.  Semi-Infinite Programming: Theory, Methods, and Applications , 1993, SIAM Rev..

[2]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[3]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[4]  Kenta Nakai,et al.  DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs , 2002, Nucleic Acids Res..

[5]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[6]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[7]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[8]  Charles A. Micchelli,et al.  Learning the Kernel Function via Regularization , 2005, J. Mach. Learn. Res..

[9]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[12]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[13]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[15]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[16]  Cheng Soon Ong,et al.  An Automated Combination of Kernels for Predicting Protein Subcellular Localization , 2007, WABI.

[17]  A. Rinaldo,et al.  On the asymptotic properties of the group lasso estimator for linear models , 2008 .

[18]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[19]  Yves Grandvalet,et al.  Composite kernel learning , 2008, ICML.

[20]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[21]  Jieping Ye,et al.  Multi-label Multiple Kernel Learning , 2008, NIPS.

[22]  Mark J. F. Gales,et al.  Combining Derivative and Parametric Kernels for Speaker Verification , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Yvan Saeys,et al.  Toward a gold standard for promoter prediction evaluation , 2009, Bioinform..