暂无分享,去创建一个
[1] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[2] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[3] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[4] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[5] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[6] Dale Schuurmans,et al. Smoothed Action Value Functions for Learning Gaussian Policies , 2018, ICML.
[7] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[8] Kagan Tumer,et al. Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.
[9] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[10] Shimon Whiteson,et al. Expected Policy Gradients , 2017, AAAI.
[11] Lawrence Carin,et al. Revisiting the Softmax Bellman Operator: New Benefits and New Perspective , 2018, ICML.
[12] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[13] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[14] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[15] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[16] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[17] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[20] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[21] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[22] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[23] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[24] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[25] Qingfeng Lan,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[26] Pieter Abbeel,et al. Towards Characterizing Divergence in Deep Q-Learning , 2019, ArXiv.
[27] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[28] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[29] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[32] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[33] Tie-Yan Liu,et al. Reinforcement Learning with Dynamic Boltzmann Softmax Updates , 2019, IJCAI.
[34] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[35] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[36] Claudio Gentile,et al. Boltzmann Exploration Done Right , 2017, NIPS.
[37] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.