暂无分享,去创建一个
Marcin Andrychowicz | Matthieu Geist | Olivier Pietquin | Olivier Bachem | Sertan Girgin | Sylvain Gelly | Marcin Michalski | L'eonard Hussenot | Anton Raichuk | Piotr Sta'nczyk | Manu Orsini | Raphael Marinier | Marcin Andrychowicz | S. Gelly | O. Pietquin | M. Geist | Marcin Michalski | Olivier Bachem | Anton Raichuk | Raphaël Marinier | L'eonard Hussenot | Sertan Girgin | Manu Orsini | Piotr Sta'nczyk
[1] David Janz,et al. Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[2] Piotr Stanczyk,et al. SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference , 2020, ICLR.
[3] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[4] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[5] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[6] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[7] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[8] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[9] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[10] Pieter Abbeel,et al. Benchmarking Model-Based Reinforcement Learning , 2019, ArXiv.
[11] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[12] Sergey Levine,et al. Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.
[13] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..
[14] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[15] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[16] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[17] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Peter Henderson,et al. Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control , 2017, ArXiv.
[20] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.
[22] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[23] H. Francis Song,et al. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.
[24] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[25] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[26] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[27] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[29] Larry Rudolph,et al. Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.
[30] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[31] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[32] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[33] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[35] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[36] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.