暂无分享,去创建一个
Jiangpeng Yan | Xiu Li | Jiafei Lyu | Xiaoteng Ma | Jiangpeng Yan | Jiafei Lyu | Xiaoteng Ma | Xiu Li
[1] Dmitry Vetrov,et al. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics , 2020, ICML.
[2] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[4] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[5] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[6] Robert Loftin,et al. Better Exploration with Optimistic Actor-Critic , 2019, NeurIPS.
[7] Yifan Wu,et al. Behavior Regularized Offline Reinforcement Learning , 2019, ArXiv.
[8] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[9] Lawrence Carin,et al. Revisiting the Softmax Bellman Operator: New Benefits and New Perspective , 2018, ICML.
[10] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[11] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[12] Lorenzo Rosasco,et al. On regularization algorithms in learning theory , 2007, J. Complex..
[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[14] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[15] Li Xia,et al. DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning , 2020 .
[16] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[17] Nahum Shimkin,et al. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.
[18] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[21] Mykel J. Kochenderfer,et al. Weighted Double Q-learning , 2017, IJCAI.
[22] Shimon Whiteson,et al. DAC: The Double Actor-Critic Architecture for Learning Options , 2019, NeurIPS.
[23] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[24] Sergey Levine,et al. Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.
[25] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[26] Jianbing Shen,et al. Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[27] Behrouz Minaei,et al. A survey of regularization strategies for deep models , 2019, Artificial Intelligence Review.
[28] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[29] Alexander Ilin,et al. Regularizing Model-Based Planning with Energy-Based Models , 2019, CoRL.
[30] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[31] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[32] Punit Pandey,et al. Approximate Q-Learning: An Introduction , 2010, 2010 Second International Conference on Machine Learning and Computing.
[33] Pierluca D'Oro,et al. How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization , 2020, NeurIPS.
[34] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[35] Olivier Buffet,et al. Policy‐Gradient Algorithms , 2013 .
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[38] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[39] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.
[40] Srinjoy Roy,et al. OPAC: Opportunistic Actor-Critic , 2020, ArXiv.
[41] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[42] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.
[43] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[44] Longbo Huang,et al. Softmax Deep Double Deterministic Policy Gradients , 2020, NeurIPS.
[45] Martha White,et al. Maxmin Q-learning: Controlling the Estimation Bias of Q-learning , 2020, ICLR.
[46] Rui Zhao,et al. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.
[47] Amr M. A. Khalifa,et al. On the Reduction of Variance and Overestimation of Deep Q-Learning , 2019, ArXiv.