On the role of tracking in stationary environments

It is often thought that learning algorithms that track the best solution, as opposed to converging to it, are important only on nonstationary problems. We present three results suggesting that this is not so. First we illustrate in a simple concrete example, the Black and White problem, that tracking can perform better than any converging algorithm on a stationary problem. Second, we show the same point on a larger, more realistic problem, an application of temporal difference learning to computer Go. Our third result suggests that tracking in stationary problems could be important for metalearning research (e.g., learning to learn, feature selection, transfer). We apply a metalearning algorithm for step-size adaptation, IDBD (Sutton, 1992a), to the Black and White problem, showing that meta-learning has a dramatic long-term effect on performance whereas, on an analogous converging problem, meta-learning has only a small second-order effect. This small result suggests a way of eventually overcoming a major obstacle to meta-learning research: the lack of an independent methodology for task selection.

[1]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[2]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[3]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[4]  Nicol N. Schraudolph,et al.  Local Gain Adaptation in Stochastic Gradient Descent , 1999 .

[5]  Markus Enzenberger,et al.  Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.

[6]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[7]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[8]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[9]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[10]  G. Konidaris A Framework for Transfer in Reinforcement Learning , 2006 .

[11]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[12]  Massimiliano Pontil,et al.  Best Of NIPS 2005: Highlights on the 'Inductive Transfer : 10 Years Later' Workshop , 2006 .

[13]  R. Sutton Gain Adaptation Beats Least Squares , 2006 .

[14]  Richard S. Sutton,et al.  Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.