Multiple Kernel Learning with Data Augmentation

© 2016 K. Nguyen, T. Le, V. Nguyen, T.D. Nguyen & D. Phung. The motivations of multiple kernel learning (MKL) approach are to increase kernel expressiveness capacity and to avoid the expensive grid search over a wide spectrum of kernels. A large amount of work has been proposed to improve the MKL in terms of the computational cost and the sparsity of the solution. However, these studies still either require an expensive grid search on the model parameters or scale unsatisfactorily with the numbers of kernels and training samples. In this paper, we address these issues by conjoining MKL, Stochastic Gradient Descent (SGD) framework, and data augmentation technique. The pathway of our proposed method is developed as follows. We first develop a maximum-aposteriori (MAP) view for MKL under a probabilistic setting and described in a graphical model. This view allows us to develop data augmentation technique to make the inference for finding the optimal parameters feasible, as opposed to traditional approach of training MKL via convex optimization techniques. As a result, we can use the standard SGD framework to learn weight matrix and extend the model to support online learning. We validate our method on several benchmark datasets in both batch and online settings. The experimental results show that our proposed method can learn the parameters in a principled way to eliminate the expensive grid search while gaining a significant computational speedup comparing with the state-of-the-art baselines.

[1]  Jung-Ying Wang,et al.  Application of Support Vector Machines in Bioinformatics , 2002 .

[2]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[3]  Zhihua Zhang,et al.  Bayesian Generalized Kernel Mixed Models , 2011, J. Mach. Learn. Res..

[4]  Ning Chen,et al.  Dropout training for SVMs with data augmentation , 2015, Frontiers of Computer Science.

[5]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[6]  Fevzi Alimo Methods of Combining Multiple Classiiers Based on Diierent Representations for Pen-based Handwritten Digit Recognition , 1996 .

[7]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[8]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[10]  Sham M. Kakade,et al.  Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[11]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[12]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[13]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[14]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[15]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[16]  Bo Zhang,et al.  Fast Parallel SVM using Data Augmentation , 2015, ArXiv.

[17]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[18]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[19]  R. Singer,et al.  The Audubon Society field guide to North American mushrooms , 1981 .

[20]  Trung Le,et al.  Distributed data augmented support vector machine on Spark , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[21]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[22]  Leonhard Held,et al.  Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data , 2009, Stat. Comput..

[23]  Mehmet G nen Bayesian Efficient Multiple Kernel Learning , 2012, ICML 2012.

[24]  William Stafford Noble,et al.  Support vector machine , 2013 .

[25]  D. F. Andrews,et al.  Scale Mixtures of Normal Distributions , 1974 .

[26]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[27]  Francesco Orabona,et al.  Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning , 2011, ICML.

[28]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[29]  Theodoros Damoulas,et al.  Pattern recognition with a Bayesian kernel combination machine , 2009, Pattern Recognit. Lett..

[30]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[31]  Barbara Caputo,et al.  Online-batch strongly convex Multi Kernel Learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[33]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[34]  Thomas Hofmann,et al.  Data Integration for Classification Problems Employing Gaussian Process Priors , 2007 .

[35]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[36]  Trung Le,et al.  One-Pass Logistic Regression for Label-Drift and Large-Scale Classification on Distributed Systems , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).