DECAF: Deep Case-based Policy Inference for knowledge transfer in Reinforcement Learning

Abstract Having the ability to solve increasingly complex problems using Reinforcement Learning (RL) has prompted researchers to start developing a greater interest in systematic approaches to retain and reuse knowledge over a variety of tasks. With Case-based Reasoning (CBR) there exists a general methodology that provides a framework for knowledge transfer which has been underrepresented in the RL literature so far. We formulate a terminology for the CBR framework targeted towards RL researchers with the goal of facilitating communication between the respective research communities. Based on this framework, we propose the Deep Case-based Policy Inference (DECAF) algorithm to accelerate learning by building a library of cases and reusing them if they are similar to a new task when training a new policy. DECAF guides the training by dynamically selecting and blending policies according to their usefulness for the current target task, reusing previously learned policies for a more effective exploration but still enabling the adaptation to particularities of the new task. We show an empirical evaluation in the Atari game playing domain depicting the benefits of our algorithm with regards to sample efficiency, robustness against negative transfer, and performance increase when compared to state-of-the-art methods.

[1]  Fan Li,et al.  Customized and knowledge-centric service design model integrating case-based reasoning and TRIZ , 2020, Expert Syst. Appl..

[2]  Ashwin Ram,et al.  Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL , 2007, IJCAI.

[3]  Andrew G. Barto,et al.  Transfer in Reinforcement Learning via Shared Features , 2012, J. Mach. Learn. Res..

[4]  Agnar Aamodt,et al.  Case Based Reasoning as a Model for Cognitive Artificial Intelligence , 2018, ICCBR.

[5]  Miquel Sànchez-Marrè,et al.  Environmental data stream mining through a case-based stochastic learning approach , 2018, Environ. Model. Softw..

[6]  Benjamin Rosman,et al.  Bayesian policy reuse , 2015, Machine Learning.

[7]  Jean-Christophe Lapayre,et al.  Segmentation of deformed kidneys and nephroblastoma using Case-Based Reasoning and Convolutional Neural Network , 2019, Expert Syst. Appl..

[8]  Ah-Hwee Tan,et al.  Towards autonomous behavior learning of non-player characters in games , 2016, Expert Syst. Appl..

[9]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[10]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[11]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[12]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[13]  Martin A. Riedmiller,et al.  CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[16]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[17]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[18]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[19]  Anna Helena Reali Costa,et al.  Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[20]  Guilherme Costa Silva,et al.  An artificial immune systems approach to Case-based Reasoning applied to fault detection and diagnosis , 2020, Expert Syst. Appl..

[21]  Ian D. Watson,et al.  Fielded applications of case-based reasoning , 2005, The Knowledge Engineering Review.

[22]  Ian D. Watson,et al.  Case-based reasoning is a methodology not a technology , 1999, Knowl. Based Syst..

[23]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[24]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Sungyoung Lee,et al.  A Case-Based Meta-Learning and Reasoning Framework for Classifiers Selection , 2018, IMCOM.

[27]  Reinaldo A. C. Bianchi,et al.  Heuristically Accelerated Reinforcement Learning by Means of Case-Based Reasoning and Transfer Learning , 2018, J. Intell. Robotic Syst..

[28]  Fang Fang,et al.  Applying hybrid case-based reasoning in agent-based negotiations for supply chain management , 2010, Expert Syst. Appl..

[29]  Eyke Hüllermeier,et al.  Credible Case-Based Inference Using Similarity Profiles , 2007, IEEE Transactions on Knowledge and Data Engineering.

[30]  Ferran Torrent-Fontbona,et al.  Case-base maintenance of a personalised and adaptive CBR bolus insulin recommender system for type 1 diabetes , 2019, Expert Syst. Appl..

[31]  S. M. F. D. Syed Mustapha Case-based reasoning for identifying knowledge leader within online community , 2018, Expert Syst. Appl..

[32]  Ruben Glatt,et al.  MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning , 2019, IEEE Transactions on Cybernetics.

[33]  Felipe Leno da Silva,et al.  Towards Knowledge Transfer in Deep Reinforcement Learning , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[34]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[35]  Balaraman Ravindran,et al.  Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain , 2015, ICLR.

[36]  Zhaohan Sheng,et al.  Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system , 2009, Expert Syst. Appl..

[37]  Ruben Glatt,et al.  Policy Reuse in Deep Reinforcement Learning , 2017, AAAI.

[38]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[39]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[40]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[41]  Stéphane Negny,et al.  Flexible knowledge representation and new similarity measure: Application on case based reasoning for waste treatment , 2016, Expert Syst. Appl..

[42]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[43]  David W. Aha,et al.  The omnipresence of case-based reasoning in science and application , 1998, Knowl. Based Syst..

[44]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[45]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[46]  Reinaldo A. C. Bianchi,et al.  Improving Reinforcement Learning by Using Case Based Heuristics , 2009, ICCBR.

[47]  Amr E. Mohamed,et al.  Speeding up single-query sampling-based algorithms using case-based reasoning , 2018, Expert Syst. Appl..

[48]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .