Skill based transfer learning with domain adaptation for continuous reinforcement learning domains

Although reinforcement learning is known as an effective machine learning technique, it might perform poorly in complex problems, especially real-world problems, leading to a slow rate of convergence. This issue magnifies when facing continuous domains where the curse of dimensionality is inevitable, and generalization is mostly desired. Transfer learning is a successful technique to remedy such a problem which results in significant improvements in learning performance by providing generalization not only within a task but also across different but related or similar tasks. The critical issue in transfer learning is how to incorporate the knowledge acquired from learning in a different but related task in the past. Domain adaptation is an exciting paradigm that seeks to address this challenge. In this paper, we propose a novel skill based Transfer Learning with Domain Adaptation (TLDA) approach suitable for continuous RL problems. TLDA discovers and learns skills as high-level knowledge from source task and then uses domain adaptation technique to help agent discover state-action mapping as a relation between the source and target tasks. With such mapping, TLDA can adapt source skills and speed up learning on a new target task. The experimental results verify the achievement of an effective transfer learning method for continuous reinforcement learning problems.

[1]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[2]  Qun Dai,et al.  A novel knowledge-leverage-based transfer learning algorithm , 2017, Applied Intelligence.

[3]  Scott Kuindersma,et al.  Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories , 2010, NIPS.

[4]  Peter Stone,et al.  An Introduction to Intertask Transfer for Reinforcement Learning , 2011, AI Mag..

[5]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[6]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[7]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[8]  Peter Dayan,et al.  Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.

[9]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[11]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[12]  Parham Moradi,et al.  Automatic Skill Acquisition in Reinforcement Learning Agents Using Connection Bridge Centrality , 2010, FGIT-FGCN.

[13]  Eric Eaton,et al.  Unsupervised Cross-Domain Transfer in Policy Gradient Reinforcement Learning via Manifold Alignment , 2015, AAAI.

[14]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[15]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[16]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[17]  Yongzhao Zhan,et al.  Domain adaptation for speech emotion recognition by sharing priors between related source and target classes , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  George Konidaris,et al.  Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.

[19]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[20]  Oscar Beijbom,et al.  Domain Adaptations for Computer Vision Applications , 2012, ArXiv.

[21]  Yaoyun Zhang,et al.  Domain Adaptation for Semantic Role Labeling of Clinical Text , 2015, AMIA.

[22]  Alessandro Lazaric,et al.  Transfer from Multiple MDPs , 2011, NIPS.

[23]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[24]  Balaraman Ravindran,et al.  Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering , 2016, 1605.05359.

[25]  Andrew G. Barto,et al.  Behavioral building blocks for autonomous agents: description, identification, and learning , 2008 .

[26]  Stefanie Tellex,et al.  Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.

[27]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[28]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[29]  Eric Eaton,et al.  An automated measure of MDP similarity for transfer in reinforcement learning , 2014, AAAI 2014.

[30]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[31]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[32]  Scott Kuindersma,et al.  CST: Constructing Skill Trees by Demonstration , 2011, ICML 2011.

[33]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[34]  Masoud Asadpour,et al.  Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains , 2017, Pattern Recognit. Lett..

[35]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[36]  Hao Wang,et al.  Reinforcement learning transfer based on subgoal discovery and subtask similarity , 2014, IEEE/CAA Journal of Automatica Sinica.

[37]  Alireza Khadivi,et al.  Automatic skill acquisition in reinforcement learning using graph centrality measures , 2012, Intell. Data Anal..

[38]  Yoshua Bengio,et al.  Universal Successor Representations for Transfer Reinforcement Learning , 2018, ICLR.

[39]  Manfred Huber,et al.  Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.

[40]  Xiao Li,et al.  Multi-source transfer learning based on label shared subspace , 2015, Pattern Recognit. Lett..

[41]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[42]  Masoud Asadpour,et al.  Transfer Learning Through Graph-based Skill Acquisition , 2017 .

[43]  Haitham Bou-Ammar,et al.  Reinforcement learning transfer via sparse coding , 2012, AAMAS.

[44]  Hamid Beigy,et al.  A novel graphical approach to automatic abstraction in reinforcement learning , 2013, Robotics Auton. Syst..

[45]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[46]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[47]  Yifeng Zeng,et al.  Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems , 2007, AAMAS 2007.

[48]  Vishal Soni,et al.  Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains , 2006, AAAI.

[49]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[50]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[51]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[52]  Aron Culotta Training a text classifier with a single word using Twitter Lists and domain adaptation , 2016, Social Network Analysis and Mining.

[53]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[54]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[55]  Doina Precup,et al.  Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..

[56]  Andrew G. Barto,et al.  Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.

[57]  Jianmin Wang,et al.  Transfer Learning with Graph Co-Regularization , 2012, IEEE Transactions on Knowledge and Data Engineering.

[58]  Tom Schaul,et al.  Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.

[59]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[60]  Jan Peters,et al.  Alignment-based transfer learning for robot models , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).