Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning
暂无分享,去创建一个
Joelle Pineau | Joshua Romoff | Mahmoud Assran | Nicolas Ballas | Mike Rabbat | Michael G. Rabbat | Joelle Pineau | Nicolas Ballas | Mahmoud Assran | Joshua Romoff
[1] Edward Grefenstette,et al. TorchBeast: A PyTorch Platform for Distributed RL , 2019, ArXiv.
[2] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[3] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[4] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[5] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[6] Christoforos N. Hadjicostis,et al. Average Consensus in the Presence of Delays in Directed Graph Topologies , 2014, IEEE Transactions on Automatic Control.
[7] Valerie Isham,et al. Non‐Negative Matrices and Markov Chains , 1983 .
[8] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[9] Stephen Tyree,et al. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.
[10] Arjun Chandra,et al. Efficient Parallel Methods for Deep Reinforcement Learning , 2017, ArXiv.
[11] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[12] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[13] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[14] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[15] E. Seneta. Non-negative Matrices and Markov Chains , 2008 .
[16] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[17] Yuandong Tian,et al. ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero , 2019, ICML.
[18] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.
[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[20] Marc G. Bellemare,et al. The Reactor: A Sample-Efficient Actor-Critic Architecture , 2017, ArXiv.
[21] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[22] Michael G. Rabbat,et al. Asynchronous Gradient Push , 2018, IEEE Transactions on Automatic Control.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] J. Wolfowitz. Products of indecomposable, aperiodic, stochastic matrices , 1963 .
[25] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[26] Stephen Tyree,et al. GA3C: GPU-based A3C for Deep Reinforcement Learning , 2016, ArXiv.
[27] J.N. Tsitsiklis,et al. Convergence in Multiagent Coordination, Consensus, and Flocking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[28] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[29] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.
[30] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[31] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.
[32] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.