Accelerated max-margin multiple kernel learning

Kernel machines such as Support Vector Machines (SVM) have exhibited successful performance in pattern classification problems mainly due to their exploitation of potentially nonlinear affinity structures of data through the kernel functions. Hence, selecting an appropriate kernel function, equivalently learning the kernel parameters accurately, has a crucial impact on the classification performance of the kernel machines. In this paper we consider the problem of learning a kernel matrix in a binary classification setup, where the hypothesis kernel family is represented as a convex hull of fixed basis kernels. While many existing approaches involve computationally intensive quadratic or semi-definite optimization, we propose novel kernel learning algorithms based on large margin estimation of Parzen window classifiers. The optimization is cast as instances of linear programming. This significantly reduces the complexity of the kernel learning compared to existing methods, while our large margin based formulation provides tight upper bounds on the generalization error. We empirically demonstrate that the new kernel learning methods maintain or improve the accuracy of the existing classification algorithms while significantly reducing the learning time on many real datasets in both supervised and semi-supervised settings.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[3]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[6]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[7]  Zoubin Ghahramani,et al.  Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning , 2004, NIPS.

[8]  Daoqiang Zhang,et al.  Learning the kernel parameters in kernel minimum distance classifier , 2006, Pattern Recognit..

[9]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[11]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[12]  Tapio Salakoski,et al.  Locality kernels for sequential data and their applications to parse ranking , 2009, Applied Intelligence.

[13]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[14]  Loris Nanni,et al.  Ensemble of Parzen window classifiers for on-line signature verification , 2005, Neurocomputing.

[15]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[16]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[17]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[18]  Luis Salinas,et al.  Kernel price pattern trading , 2008, Applied Intelligence.

[19]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Octavia I. Camps,et al.  Weighted Parzen Windows for Pattern Classification , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Dino Isa,et al.  An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization , 2011, Applied Intelligence.

[24]  Jie Wang,et al.  Gaussian kernel optimization for pattern classification , 2009, Pattern Recognit..

[25]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[26]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[27]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[28]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[29]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[30]  Murat Dundar,et al.  Semi-supervised mixture of kernels via LPBoost methods , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[31]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[32]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[33]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[34]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[35]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .