Incorporation of radius-info can be simple with SimpleMKL

Recent research has shown the benefit of incorporating the radius of the Minimal Enclosing Ball (MEB) of training data into Multiple Kernel Learning (MKL). However, straightforwardly incorporating this radius leads to complex learning structure and considerably increased computation. Moreover, the notorious sensitivity of this radius to outliers can adversely affect MKL. In this paper, instead of directly incorporating the radius of MEB, we incorporate its close relative, the trace of data scattering matrix, to avoid the above problems. By analyzing the characteristics of the resulting optimization, we show that the benefit of incorporating the radius of MEB can be fully retained. More importantly, our algorithm can be effortlessly realized within the existing MKL framework such as SimpleMKL. The mere difference is the way to normalize the basic kernels. Although this kernel normalization is not our invention, our theoretic derivation uncovers why this normalization can achieve better classification performance, which has not appeared in the literature before. As experimentally demonstrated, our method achieves the overall best learning performance in various settings. In another perspective, our work improves SimpleMKL to utilize the information of the radius of MEB in an efficient and practical way.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Alexander Zien,et al.  Non-Sparse Regularization and Efficient Training with Multiple Kernels , 2010, ArXiv.

[4]  Zenglin Xu,et al.  Simple and Efficient Multiple Kernel Learning by Group Lasso , 2010, ICML.

[5]  Changshui Zhang,et al.  Learning Kernels with Radiuses of Minimum Enclosing Balls , 2010, NIPS.

[6]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[7]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[8]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[9]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[10]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[11]  Shai Ben-David,et al.  Learning Bounds for Support Vector Machines with Learned Kernels , 2006, COLT.

[12]  C. Campbell,et al.  Generalization bounds for learning the kernel , 2009 .

[13]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[14]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[15]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[16]  Melanie Hilario,et al.  Margin and Radius Based Multiple Kernel Learning , 2009, ECML/PKDD.

[17]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[18]  Ethem Alpaydin,et al.  Localized multiple kernel learning , 2008, ICML '08.

[19]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[20]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[22]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.