Knowledge Transfer between Multi-granularity Models for Reinforcement Learning

As a widely used machine learning method, reinforcement learning (RL) is a very effective way to solve decision and control problems where learning skills are needed. In this paper, a knowledge transfer method between multi-granularity models is proposed for RL to speed up the learning process and adapt to the dynamic environments. The learning process runs on naturally organized multi-granularity models, e.g., the coarse-grained model and the fine-grained model. This multi-granularity model constitutes a knowledge transfer architecture that bridges the reinforcement learning between different granularity levels. The proposed multi-granularity reinforcement learning (MGRL) approach and related algorithms can scale up very well and speed up learning with other granularity learning process. Several groups of simulation experiments are carried out using a puzzle problem in a gridworld environment. The results demonstrate the effectiveness and efficiency of the proposed approach.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[3]  Meng Joo Er,et al.  Obstacle avoidance of a mobile robot using hybrid learning approach , 2005, IEEE Transactions on Industrial Electronics.

[4]  Daoyi Dong,et al.  Robust Quantum-Inspired Reinforcement Learning for Robot Navigation , 2012, IEEE/ASME Transactions on Mechatronics.

[5]  Xi-Ren Cao,et al.  Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..

[6]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[7]  Manuela M. Veloso,et al.  Decentralized MDPs with sparse interactions , 2011, Artif. Intell..

[8]  Dana Kulic,et al.  Incremental learning of full body motion primitives and their sequencing through human motion observation , 2012, Int. J. Robotics Res..

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Daoyi Dong,et al.  A novel incremental learning scheme for reinforcement learning in dynamic environments , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[13]  Daniel S. Bernstein,et al.  Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .

[14]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Michael G. Madden,et al.  Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty , 2004, Artificial Intelligence Review.

[16]  Jan Ramon,et al.  Transfer learning for reinforcement learning through goal and policy parametrization , 2006, ICML 2006.

[17]  Qinmin Yang,et al.  Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Tzyh Jong Tarn,et al.  Quantum Reinforcement Learning , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Mohammad A. Jaradat,et al.  Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[22]  Daoyi Dong,et al.  Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments , 2019, IEEE/ASME Transactions on Mechatronics.

[23]  Fernando S. Oliveira,et al.  Capacity expansion under uncertainty in an oligopoly using indirect reinforcement-learning , 2018, Eur. J. Oper. Res..

[24]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[25]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[26]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[27]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[28]  Tze-Yun Leong,et al.  Scalable transfer learning in heterogeneous, dynamic environments , 2017, Artif. Intell..

[29]  Reinaldo A. C. Bianchi,et al.  Transferring knowledge as heuristics in reinforcement learning: A case-based approach , 2015, Artif. Intell..