暂无分享,去创建一个
Razvan Pascanu | Max Jaderberg | Simon Osindero | Wojciech Czarnecki | Grzegorz Swirszcz | Siddhant M. Jayakumar | Wojciech M. Czarnecki | Max Jaderberg | Simon Osindero | Razvan Pascanu | G. Swirszcz
[1] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[2] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[3] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[4] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[5] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[6] Andrew Zisserman,et al. Kickstarting Deep Reinforcement Learning , 2018, ArXiv.
[7] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Pieter Abbeel,et al. Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.
[9] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[10] Shu Wang,et al. Collaborative Deep Reinforcement Learning , 2017, ArXiv.
[11] Sam Devlin,et al. Dynamic potential-based reward shaping , 2012, AAMAS.
[12] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[14] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[15] Sinno Jialin Pan,et al. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.
[16] Sergey Levine,et al. Divide-and-Conquer Reinforcement Learning , 2017, ICLR.
[17] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.
[18] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[19] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[20] Chong Li,et al. Multi-task Learning for Continuous Control , 2018, ArXiv.
[21] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[22] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[23] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[24] Jian Peng,et al. Genetic Policy Optimization , 2017, ICLR 2018.
[25] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.
[26] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[27] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[28] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[29] Thomas P. Minka,et al. Divergence measures and message passing , 2005 .
[30] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[31] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.
[32] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[33] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.