Online multiple kernel regression

Kernel-based regression represents an important family of learning techniques for solving challenging regression tasks with non-linear patterns. Despite being studied extensively, most of the existing work suffers from two major drawbacks: (i) they are often designed for solving regression tasks in a batch learning setting, making them not only computationally inefficient and but also poorly scalable in real-world applications where data arrives sequentially; and (ii) they usually assume a fixed kernel function is given prior to the learning task, which could result in poor performance if the chosen kernel is inappropriate. To overcome these drawbacks, this paper presents a novel scheme of Online Multiple Kernel Regression (OMKR), which sequentially learns the kernel-based regressor in an online and scalable fashion, and dynamically explore a pool of multiple diverse kernels to avoid suffering from a single fixed poor kernel so as to remedy the drawback of manual/heuristic kernel selection. The OMKR problem is more challenging than regular kernel-based regression tasks since we have to on-the-fly determine both the optimal kernel-based regressor for each individual kernel and the best combination of the multiple kernel regressors. In this paper, we propose a family of OMKR algorithms for regression and discuss their application to time series prediction tasks. We also analyze the theoretical bounds of the proposed OMKR method and conduct extensive experiments to evaluate its empirical performance on both real-world regression and times series prediction tasks.

[1]  Ravi Sankar,et al.  Time Series Prediction Using Support Vector Machines: A Survey , 2009, IEEE Computational Intelligence Magazine.

[2]  Ivor W. Tsang,et al.  A Family of Simple Non-Parametric Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[3]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[4]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[5]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[6]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[7]  Eric P. Xing,et al.  Online Multiple Kernel Learning for Structured Prediction , 2010, 1010.2770.

[8]  Lutgarde M. C. Buydens,et al.  Using support vector machines for time series prediction , 2003 .

[9]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[10]  Wolfgang Rosenstiel,et al.  Online SVR Training by Solving the Primal Optimization Problem , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[11]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[12]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[13]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[14]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[15]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[16]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[17]  Rong Jin,et al.  Online Multiple Kernel Classification , 2013, Machine Learning.

[18]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[19]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[20]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[21]  Zenglin Xu,et al.  Online Learning for Group Lasso , 2010, ICML.

[22]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[23]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[25]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[26]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[27]  Koby Crammer,et al.  A Last-Step Regression Algorithm for Non-Stationary Online Learning , 2013, AISTATS.

[28]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[31]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[32]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[33]  Edward Y. Chang,et al.  Learning the unified kernel machines for classification , 2006, KDD '06.

[34]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[35]  Steven C. H. Hoi,et al.  Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning , 2012, ICML.

[36]  Rong Jin,et al.  Online Multiple Kernel Learning: Algorithms and Mistake Bounds , 2010, ALT.

[37]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[38]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[39]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..