Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments

In this paper, a novel incremental learning algorithm is presented for reinforcement learning (RL) in dynamic environments, where the rewards of state-action pairs may change over time. The proposed incremental RL (IRL) algorithm learns from the dynamic environments without making any assumptions or having any prior knowledge about the ever-changing environment. First, IRL generates a detector-agent to detect the changed part of the environment (drift environment) by executing a virtual RL process. Then, the agent gives priority to the drift environment and its neighbor environment for iteratively updating their state-action value functions using new rewards by dynamic programming. After the prioritized sweeping process, IRL restarts a canonical learning process to obtain a new optimal policy adapting to the new environment. The novelty is that IRL fuses the new information into the existing knowledge system incrementally as well as weakening the conflict between them. The IRL algorithm is compared to two direct approaches and various state-of-the-art transfer learning methods for classical maze navigation problems and an intelligent warehouse with multiple robots. The experimental results verify that IRL can effectively improve the adaptability and efficiency of RL algorithms in dynamic environments.

[1]  Daoyi Dong,et al.  A novel incremental learning scheme for reinforcement learning in dynamic environments , 2016, 2016 12th World Congress on Intelligent Control and Automation (WCICA).

[2]  Daoyi Dong,et al.  Robust Quantum-Inspired Reinforcement Learning for Robot Navigation , 2012, IEEE/ASME Transactions on Mechatronics.

[3]  Tzyh Jong Tarn,et al.  Fidelity-Based Probabilistic Q-Learning for Control of Quantum Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  James E. Ward,et al.  Coordinating a one-warehouse N-retailer distribution system under retailer-reporting , 2011 .

[5]  Javier García,et al.  Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..

[6]  Veysel Gazi,et al.  Implementation Studies of Robot Swarm Navigation Using Potential Functions and Panel Methods , 2016, IEEE/ASME Transactions on Mechatronics.

[7]  Kagan Tumer,et al.  Analyzing and visualizing multiagent rewards in dynamic and stochastic domains , 2008, Autonomous Agents and Multi-Agent Systems.

[8]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[9]  Yang Gao,et al.  Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer , 2015, IEEE Transactions on Cybernetics.

[10]  Roman Zajdel Epoch-incremental reinforcement learning algorithms , 2013, Int. J. Appl. Math. Comput. Sci..

[11]  Dana Kulic,et al.  Incremental learning of full body motion primitives and their sequencing through human motion observation , 2012, Int. J. Robotics Res..

[12]  Jing Peng,et al.  Incremental multi-step Q-learning , 1994, Machine Learning.

[13]  Raffaello D'Andrea,et al.  Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses , 2007, AI Mag..

[14]  Tetsuo Sawaragi,et al.  Incremental acquisition of behaviors and signs based on a reinforcement learning schemata model and a spike timing-dependent plasticity network , 2007, Adv. Robotics.

[15]  Grigorios Tsoumakas,et al.  Transferring task models in Reinforcement Learning agents , 2013, Neurocomputing.

[16]  Tze-Yun Leong,et al.  Scalable transfer learning in heterogeneous, dynamic environments , 2017, Artif. Intell..

[17]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[18]  Anna Helena Reali Costa,et al.  Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[19]  Shin Ishii,et al.  Incremental State Aggregation for Value Function Estimation in Reinforcement Learning , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[21]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[22]  Tzyh Jong Tarn,et al.  Quantum Reinforcement Learning , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[24]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Y. Matsuoka,et al.  Reinforcement Learning and Synergistic Control of the ACT Hand , 2013, IEEE/ASME Transactions on Mechatronics.

[27]  Kao-Shing Hwang,et al.  A Grey Synthesis Approach to Efficient Architecture Design for Temporal Difference Learning , 2011, IEEE/ASME Transactions on Mechatronics.

[28]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[29]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[30]  Xin Yao,et al.  Population-Based Incremental Learning With Associative Memory for Dynamic Environments , 2008, IEEE Transactions on Evolutionary Computation.

[31]  Habib Rajabi Mashhadi,et al.  An Adaptive $Q$-Learning Algorithm Developed for Agent-Based Computational Modeling of Electricity Market , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[32]  Mohammad A. Jaradat,et al.  Reinforcement based mobile robot navigation in dynamic environment , 2011 .

[33]  Haibo He,et al.  Incremental Learning From Stream Data , 2011, IEEE Transactions on Neural Networks.

[34]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[35]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Steven Guan,et al.  An incremental approach to genetic-algorithms-based classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  C.W. de Silva,et al.  Sequential $Q$ -Learning With Kalman Filtering for Multirobot Cooperative Transportation , 2010, IEEE/ASME Transactions on Mechatronics.

[39]  Ying Wang,et al.  A Hybrid Visual Servo Controller for Robust Grasping by Wheeled Mobile Robots , 2010, IEEE/ASME Transactions on Mechatronics.

[40]  Benjamin Rosman,et al.  Bayesian policy reuse , 2015, Machine Learning.

[41]  Christopher A. Kitts,et al.  A Hybrid Multirobot Control Architecture for Object Transport , 2016, IEEE/ASME Transactions on Mechatronics.

[42]  Dao-Qing Dai,et al.  Incremental learning of bidirectional principal components for face recognition , 2010, Pattern Recognit..