Online Transfer Learning in Reinforcement Learning Domains

This paper proposes an online transfer framework to capture the interaction among agents and shows that current transfer learning in reinforcement learning is a special case of online transfer. Furthermore, this paper re-characterizes existing agents-teaching-agents methods as online transfer and analyze one such teaching method in three ways. First, the convergence of Q-learning and Sarsa with tabular representation with a finite budget is proven. Second, the convergence of Q-learning and Sarsa with linear function approximation is established. Third, the we show the asymptotic performance cannot be hurt through teaching. Additionally, all theoretical results are empirically validated.

[1]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[2]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[3]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[4]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[5]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[6]  Matthieu Zimmer,et al.  Teacher-Student Framework: a Reinforcement Learning Approach , 2014 .

[7]  Alessandro Lazaric,et al.  Regret Bounds for Reinforcement Learning with Policy Advice , 2013, ECML/PKDD.

[8]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Manuel Lopes,et al.  Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[11]  W. Rudin Real and complex analysis , 1968 .

[12]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[13]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[14]  W.D. Smart,et al.  What does shaping mean for computational reinforcement learning? , 2008, 2008 7th IEEE International Conference on Development and Learning.

[15]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[16]  Thomas Zeugmann,et al.  Recent Developments in Algorithmic Teaching , 2009, LATA.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Steven C. H. Hoi,et al.  OTL: A Framework of Online Transfer Learning , 2010, ICML.

[19]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[20]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[21]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[22]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[23]  Thomas Zeugmann,et al.  Teaching Learners with Restricted Mind Changes , 2005, ALT.

[24]  Raymond J. Mooney,et al.  Using Active Relocation to Aid Reinforcement Learning , 2006, FLAIRS.

[25]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[26]  Simon M. Lucas,et al.  Ms Pac-Man versus Ghost Team CEC 2011 competition , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[27]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[28]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..