An Experts Algorithm for Transfer Learning

A long-lived agent continually faces new tasks in its environment. Such an agent may be able to use knowledge learned in solving earlier tasks to produce candidate policies for its current task. There may, however, be multiple reasonable policies suggested by prior experience, and the agent must choose between them potentially without any a priori knowledge about their applicability to its current situation. We present an "experts" algorithm for efficiently choosing amongst candidate policies in solving an unknown Markov decision process task. We conclude with the results of experiments on two domains in which we generate candidate policies from solutions to related tasks and use our experts algorithm to choose amongst them.

[1]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.

[2]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[3]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[4]  Nimrod Megiddo,et al.  Exploration-Exploitation Tradeoffs for Experts Algorithms in Reactive Environments , 2004, NIPS.

[5]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[6]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[7]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[8]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[9]  Peter Stone,et al.  Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.

[10]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[11]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Michael I. Jordan,et al.  Convergence results for the EM approach to mixtures of experts architectures , 1995, Neural Networks.

[14]  Peter Stone,et al.  Keepaway Soccer: A Machine Learning Testbed , 2001, RoboCup.

[15]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..