暂无分享,去创建一个
[1] Martin A. Riedmiller,et al. Quinoa: a Q-function You Infer Normalized Over Actions , 2019, ArXiv.
[2] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[5] Bruno Scherrer,et al. Leverage the Average: an Analysis of Regularization in RL , 2020, ArXiv.
[6] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[7] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[8] Avishek Joey Bose,et al. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies , 2019, ArXiv.
[9] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[10] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[13] O. Pietquin,et al. Munchausen Reinforcement Learning , 2020, NeurIPS.
[14] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[15] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[16] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[17] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[18] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[19] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[20] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[21] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[23] Leon Hirsch,et al. Fundamentals Of Convex Analysis , 2016 .
[24] Matthieu Geist,et al. Is the Bellman residual a bad proxy? , 2016, NIPS.
[25] Yunhao Tang,et al. Discretizing Continuous Action Space for On-Policy Optimization , 2019, AAAI.
[26] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[27] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[28] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.
[29] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.