Tuning-free step-size adaptation

Incremental learning algorithms based on gradient descent are effective and popular in online supervised learning, reinforcement learning, signal processing, and many other application areas. An oft-noted drawback of these algorithms is that they include a step-size parameter that needs to be tuned for best performance, which may require manual intervention and significant domain knowledge or additional data. In many cases, an entire vector of step-size parameters (e.g., one for each input feature) needs to be tuned in order to attain the best performance of the algorithm. To address this, several methods have been proposed for adapting step sizes online. For example, Sutton's IDBD method can find the best vector step size for the LMS algorithm, and Schraudolph's ELK1 method, an extension of IDBD to neural networks, has proven effective on large applications, such as 3D hand tracking. However, to date all such step-size adaptation methods have included a tunable step-size parameter of their own, which we call the meta-step-size parameter. In this paper we show that the performance of existing step-size adaptation methods are strongly dependent on the choice of their meta-step-size parameter and that their meta-step-size parameter cannot be set reliably in a problem-independent way. We introduce a series of modifications and normalizations to the IDBD method that together eliminate the need to tune the meta-step-size parameter to the particular problem. We show that the resulting overall algorithm, called Autostep, performs as well or better than the existing step-size adaptation methods on a number of idealized and robot prediction problems and does not require any tuning of its meta-step-size parameter. The ideas behind Autostep are not restricted to the IDBD method and the same principles are potentially applicable to other incremental learning settings, such as reinforcement learning.

[1]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[2]  Haim Sompolinsky,et al.  On-line Learning of Dichotomies: Algorithms and Learning Curves. , 1995, NIPS 1995.

[3]  Thibault Langlois,et al.  Parameter adaptation in stochastic optimization , 1999 .

[4]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[5]  A. Buttfield,et al.  Towards a robust BCI: error potentials and online learning , 2006, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[6]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[7]  H. Kushner,et al.  Analysis of adaptive step size SA algorithms for parameter tracking , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[8]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[9]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[10]  Ashique Mahmood Automatic step-size adaptation in incremental supervised learning , 2010 .

[11]  Richard S. Sutton,et al.  Goal Seeking Components for Adaptive Intelligence: An Initial Assessment. , 1981 .

[12]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[13]  Andreas Ziehe,et al.  Adaptive On-line Learning in Changing Environments , 1996, NIPS.

[14]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[15]  Nicol N. Schraudolph,et al.  3D hand tracking by rapid stochastic gradient descent using a skinning model , 2004 .