Policy Transfer using Reward Shaping

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespective of the learning algorithm used in the source task. We advance the state-of-the-art by using a reward shaping approach to policy transfer. One of the advantages in following such an approach, is that it firmly grounds policy transfer in an actively developing body of theoretical research on reward shaping. Experiments in Mountain Car, Cart Pole and Mario demonstrate the practical usefulness of the approach.

[1]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[2]  Lisa A. Torrey Help an Agent Out : Student / Teacher Learning in Sequential Decision Tasks , 2011 .

[3]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[4]  Matthew E. Taylor,et al.  Combining Multiple Correlated Reward and Shaping Signals by Measuring Confidence , 2014, AAAI.

[5]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6]  Sam Devlin,et al.  Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[7]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[8]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[9]  Julian Togelius,et al.  The Mario AI Benchmark and Competitions , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Daniel Kudenko,et al.  Using plan-based reward shaping to learn strategies in StarCraft: Broodwar , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[11]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[12]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[13]  Sam Devlin,et al.  An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems , 2011, Adv. Complex Syst..

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Matthew E. Taylor,et al.  Towards student/teacher learning in sequential decision tasks , 2012, AAMAS.

[16]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[17]  Garrison W. Cottrell,et al.  Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[18]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[19]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[20]  Sam Devlin,et al.  Dynamic potential-based reward shaping , 2012, AAMAS.

[21]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[22]  Donald Michie,et al.  BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .

[23]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[24]  Balaraman Ravindran,et al.  Relativized Options: Choosing the Right Transformation , 2003, ICML.

[25]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[26]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[27]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.