Student-Initiated Action Advising via Advice Novelty

Action advising is a knowledge exchange mechanism between peers, namely student and teacher, that can help tackle exploration and sample inefficiency problems in deep reinforcement learning. Due to the practical limitations in peer-to-peer communication and the negative implications of over-advising, the peer responsible for initiating these interactions needs to do so only when it's most adequate to exchange advice. Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, these estimations have several weaknesses, such as having no information regarding the characteristics of convergence and being subject to delays that occur in the presence of experience replay dynamics. We propose a student-initiated action advising algorithm that alleviates these shortcomings. Specifically, we employ Random Network Distillation (RND) to measure the novelty of an advice, for the student to determine whether to proceed with the request; furthermore, we perform RND updates only for the advised states to ensure that the student's convergence will not prevent it from utilising the teacher's knowledge at any stage of learning. Experiments in GridWorld and simplified versions of five Atari games show that our approach can perform on par with the state-of-the-art and demonstrate significant advantages in the scenarios where the existing methods are prone to fail.

[1]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[2]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[3]  Jonathan P. How,et al.  Learning Hierarchical Teaching in Cooperative Multiagent Reinforcement Learning , 2019, ArXiv.

[4]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[5]  Diego Perez Liebana,et al.  Action Advising with Advice Imitation in Deep Reinforcement Learning , 2021, AAMAS.

[6]  Ioannis P. Vlahavas,et al.  Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..

[7]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[8]  Pablo Hernandez-Leal,et al.  Uncertainty-Aware Action Advising for Deep Reinforcement Learning Agents , 2020, AAAI.

[9]  Felipe Leno da Silva,et al.  Simultaneously Learning and Advising in Multiagent Reinforcement Learning , 2017, AAMAS.

[10]  Masashi Sugiyama,et al.  Active deep Q-learning with demonstration , 2018, Machine Learning.

[11]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[14]  Matthieu Zimmer,et al.  Teacher-Student Framework: a Reinforcement Learning Approach , 2014 .

[15]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[16]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Doina Precup,et al.  Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.

[19]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[20]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2020 .

[21]  Tian Tian,et al.  MinAtar: An Atari-inspired Testbed for More Efficient Reinforcement Learning Experiments , 2019, ArXiv.

[22]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[23]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[24]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[25]  Marlos C. Machado,et al.  Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment , 2019, ArXiv.

[26]  Diego Perez Liebana,et al.  Teaching on a Budget in Multi-Agent Deep Reinforcement Learning , 2019, 2019 IEEE Conference on Games (CoG).

[27]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[28]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[29]  Peter Stone,et al.  Agents teaching agents: a survey on inter-agent transfer learning , 2019, Autonomous Agents and Multi-Agent Systems.

[30]  Sergey Levine,et al.  Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction , 2019, NeurIPS.

[31]  Yusen Zhan,et al.  Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer , 2016, IJCAI.

[32]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[33]  Daniel Seita,et al.  ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations , 2019, ArXiv.

[34]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.