暂无分享,去创建一个
[1] Patrick MacAlpine,et al. UT Austin Villa: RoboCup 2016 3D Simulation League Competition and Technical Challenges Champions , 2015, Robot Soccer World Cup.
[2] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[3] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[5] Mathieu Gerber,et al. Approximate Bayesian computation with the Wasserstein distance , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[6] Marc G. Bellemare,et al. The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.
[7] Yunhao Tang,et al. Implicit Policy for Reinforcement Learning , 2018, ArXiv.
[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[9] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[10] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.
[11] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .
[12] Alexander G. Schwing,et al. Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Yongxin Chen,et al. Sample-based Distributional Policy Gradient , 2020, L4DC.
[14] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[15] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[16] Patrick MacAlpine,et al. UT Austin Villa: RoboCup 2015 3D Simulation League Competition and Technical Challenges Champions , 2015, RoboCup.
[17] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[18] Avishek Joey Bose,et al. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies , 2019, ArXiv.
[19] Larry Rudolph,et al. Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.
[20] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[21] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[22] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[23] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.
[24] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[25] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[26] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[27] C. Villani. Optimal Transport: Old and New , 2008 .
[28] Roland Badeau,et al. Generalized Sliced Wasserstein Distances , 2019, NeurIPS.
[29] S. Goldstein,et al. On intrinsic randomness of dynamical systems , 1981 .
[30] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[31] Shie Mannor,et al. Distributional Policy Optimization: An Alternative Approach for Continuous Control , 2019, NeurIPS.
[32] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[33] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[34] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[35] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[36] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[38] T. Urbanik,et al. Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .
[39] Mingyuan Zhou,et al. Thompson Sampling via Local Uncertainty , 2020, ICML.
[40] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[41] Yunhao Tang,et al. Discrete Action On-Policy Learning with Action-Value Critic , 2020, AISTATS.
[42] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[43] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[44] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.
[45] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[46] Mingyuan Zhou,et al. Semi-Implicit Variational Inference , 2018, ICML.
[47] Richard N. Zare,et al. Optimizing Chemical Reactions with Deep Reinforcement Learning , 2017, ACS central science.
[48] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[49] Dmitry P. Vetrov,et al. Doubly Semi-Implicit Variational Inference , 2018, AISTATS.
[50] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.