Action Advising with Advice Imitation in Deep Reinforcement Learning

Action advising is a peer-to-peer knowledge exchange technique built on the teacher-student paradigm to alleviate the sample inefficiency problem in deep reinforcement learning. Recently proposed student-initiated approaches have obtained promising results. However, due to being in the early stages of development, these also have some substantial shortcomings. One of the abilities that are absent in the current methods is further utilising advice by reusing, which is especially crucial in the practical settings considering the budget constraints in peer-to-peer interactions. In this study, we present an approach to enable the student agent to imitate previously acquired advice to reuse them directly in its exploration policy, without any interventions in the learning mechanism itself. In particular, we employ a behavioural cloning module to imitate the teacher policy and use dropout regularisation to have a notion of epistemic uncertainty to keep track of which state-advice pairs are actually collected. As the results of experiments we conducted in three Atari games show, advice reusing via imitation is indeed a feasible option in deep RL and our approach can successfully achieve this while significantly improving the learning performance, even when it is paired with a simple early advising heuristic.

[1]  Yusen Zhan,et al.  Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.

[2]  Ioannis P. Vlahavas,et al.  Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..

[3]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[4]  Pablo Hernandez-Leal,et al.  Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents , 2020, AAAI.

[5]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[6]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[7]  G. Tesauro,et al.  Learning Hierarchical Teaching Policies for Cooperative Agents , 2019, AAMAS.

[8]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2019, Autonomous Agents and Multi-Agent Systems.

[9]  Diego Perez Liebana,et al.  Student-Initiated Action Advising via Advice Novelty , 2021, IEEE Transactions on Games.

[10]  Diego Perez Liebana,et al.  Teaching on a Budget in Multi-Agent Deep Reinforcement Learning , 2019, 2019 IEEE Conference on Games (CoG).

[11]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[12]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[15]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[16]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[17]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[18]  Ioannis P. Vlahavas,et al.  Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..

[19]  Matthieu Zimmer,et al.  Teacher-Student Framework: a Reinforcement Learning Approach , 2014 .

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[22]  Masashi Sugiyama,et al.  Active deep Q-learning with demonstration , 2018, Machine Learning.

[23]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[24]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[25]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[26]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[27]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[28]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[29]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[31]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[32]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[33]  Shuyue Hu,et al.  Learning by Reusing Previous Advice in Teacher-Student Paradigm , 2020, AAMAS.