Explicit update vs implicit update

In this paper, the problem of implicit online learning is considered. A tighter convergence bound is derived, which demonstrates theoretically the feasibility of implicit update for online learning. Then we combine SMD with implicit update technique and the resulting algorithm possesses the inherent stability. Theoretical result is well corroborated by the experiments we performed which also indicate that combining SMD with implicit update technique is another promising way for online learning.

[1]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[2]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[3]  Massimiliano Pontil,et al.  Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..

[4]  Yuan Yao,et al.  Online Learning Algorithms , 2006, Found. Comput. Math..

[5]  Zhizhong Wang,et al.  Fast Forecasting with Simplified Kernel Regression Machines , 2007, 2007 International Conference on Computational Intelligence and Security (CIS 2007).

[6]  Nicol N. Schraudolph,et al.  Stochastic optimisation for high-dimensional tracking in dense range maps , 2005 .

[7]  Li Li,et al.  A Novel Hybrid Real-Valued Genetic Algorithm for Optimization Problems , 2007 .

[8]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[9]  Zhizhong Wang,et al.  Model optimizing and feature selecting for support vector regression in time series forecasting , 2008, Neurocomputing.

[10]  John Hallam,et al.  IEEE International Joint Conference on Neural Networks , 2005 .

[11]  N. Schraudolph Fast Second-Order Gradient Descent via O(n) Curvature Matrix-Vector Products , 2000 .

[12]  Nicol N. Schraudolph,et al.  Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.

[13]  Luc Van Gool,et al.  Fast stochastic optimization for articulated structure tracking , 2007, Image Vis. Comput..

[14]  Bernhard Schölkopf,et al.  Iterative kernel principal component analysis for image modeling , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[16]  Ju-Jang Lee,et al.  Heterogeneous local model networks for time series prediction , 2005, Appl. Math. Comput..

[17]  Alexander J. Smola,et al.  Step Size Adaptation in Reproducing Kernel Hilbert Space , 2006, J. Mach. Learn. Res..

[18]  Nicol N. Schraudolph,et al.  Online Independent Component Analysis with Local Learning Rate Adaptation , 1999, NIPS.

[19]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[20]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[21]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[22]  K. Chellapilla,et al.  Optimization of bilinear time series models using fast evolutionary programming , 1998, IEEE Signal Processing Letters.