Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment

The success of applying policy gradient reinforcement learning (RL) to difficult control tasks hinges crucially on the ability to determine a sensible initialization for the policy. Transfer learning methods tackle this problem by reusing knowledge gleaned from solving other related tasks. In the case of multiple task domains, these algorithms require an inter-task mapping to facilitate knowledge transfer across domains. However, there are currently no general methods to learn an inter-task mapping without requiring either background knowledge that is not typically present in RL settings, or an expensive analysis of an exponential number of inter-task mappings in the size of the state and action spaces. This paper introduces an autonomous framework that uses unsupervised manifold alignment to learn intertask mappings and effectively transfer samples between different task domains. Empirical results on diverse dynamical systems, including an application to quadrotor control, demonstrate its effectiveness for cross-domain transfer in the context of policy gradient RL.

[1]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[2]  Haitham Bou-Ammar,et al.  Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[3]  Jan Peters,et al.  Alignment-based transfer learning for robot models , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[4]  Yaacov Ritov,et al.  Local procrustes for manifold embedding: a measure of embedding quality and embedding algorithms , 2009, Machine Learning.

[5]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[6]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[7]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[9]  Holger Voos,et al.  Nonlinear tracking and landing controller for quadrotor aerial robots , 2010, 2010 IEEE International Conference on Control Applications.

[10]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[11]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[12]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[13]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[14]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Philip S. Thomas,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines , 2017, ArXiv.

[17]  Jude W. Shavlik,et al.  Relational Macros for Transfer in Reinforcement Learning , 2007, ILP.

[18]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[19]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[20]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[21]  Samir Bouabdallah,et al.  Design and control of quadrotors with application to autonomous flying , 2007 .