Limited Stochastic Meta-Descent for Kernel-Based Online Learning

To improve the single-run performance of online learning and reinforce its stability, we consider online learning with limited adaptive learning rate in this letter. The letter extends convergence proofs for NORMA to a range of step sizes, then employs support vector learning with stochastic meta-descent (SVMD) limited to that range for step size adaptation, so as to obtain an online kernel algorithm that combines theoretical convergence guarantees with good practical performance. Experiments on different data sets corroborate theoretical results well and show that our method is another promising way for online learning.

[1]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[2]  Nicol N. Schraudolph,et al.  Online Independent Component Analysis with Local Learning Rate Adaptation , 1999, NIPS.

[3]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[4]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[7]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[8]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[9]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[10]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[11]  Yuan Yao,et al.  Online Learning Algorithms , 2006, Found. Comput. Math..

[12]  Massimiliano Pontil,et al.  Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..

[13]  Nicol N. Schraudolph,et al.  Stochastic optimisation for high-dimensional tracking in dense range maps , 2005 .

[14]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[15]  Nicol N. Schraudolph,et al.  Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.

[16]  Luc Van Gool,et al.  Fast stochastic optimization for articulated structure tracking , 2007, Image Vis. Comput..

[17]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..

[18]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[19]  S. V. N. Vishwanathan,et al.  Fast Iterative Kernel Principal Component Analysis , 2007, J. Mach. Learn. Res..

[20]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[21]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[22]  Hui Jiang,et al.  Explicit update vs implicit update , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).