论文信息 - Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problematic type of 'ray interference', characterized by learning dynamics that sequentially traverse a number of performance plateaus, effectively constraining the agent to learn one thing at a time even when learning in parallel is better. We establish the conditions under which ray interference occurs, show its relation to saddle points and obtain the exact learning dynamics in a restricted setting. We characterize a number of its properties and discuss possible remedies.

[1] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.

[2] Nicolas Le Roux,et al. The Value Function Polytope in Reinforcement Learning , 2019, ICML.

[3] Jürgen Schmidhuber,et al. Deep Networks with Internal Selective Attention through Feedback Connections , 2014, NIPS.

[4] David D. Cox,et al. Tensor Switching Networks , 2016, NIPS.

[5] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.

[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[9] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[10] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[11] Li Fei-Fei,et al. Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[12] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[13] Razvan Pascanu,et al. Adapting Auxiliary Losses Using Gradient Similarity , 2018, ArXiv.

[14] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[15] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[16] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[17] Pierre-Yves Oudeyer,et al. Modular active curiosity-driven discovery of tool use , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Bruno Sareni,et al. Fitness sharing and niching methods revisited , 1998, IEEE Trans. Evol. Comput..

[19] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[20] Jian Peng,et al. Genetic Policy Optimization , 2017, ICLR 2018.

[21] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[22] Qiang Yang,et al. Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[23] L. Kaelbling,et al. Toward Hierachical Decomposition for Planning in Uncertain Environments , 2001 .

[24] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[25] Wojciech Czarnecki,et al. Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[26] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[27] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[30] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[32] Dimitris S. Papailiopoulos,et al. Gradient Diversity Empowers Distributed Learning , 2017, ArXiv.

[33] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[34] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.