暂无分享,去创建一个
[1] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[2] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[3] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[4] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[5] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[7] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[8] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[9] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[10] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[11] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[12] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[15] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] Derong Liu,et al. Action dependent heuristic dynamic programming for home energy resource scheduling , 2013 .
[17] Haibo He,et al. Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[18] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[19] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[20] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[21] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[22] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[23] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[24] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[25] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[26] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[27] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[29] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[30] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[31] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[32] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[33] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[35] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[36] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[37] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[38] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[39] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[40] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[41] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[42] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[43] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[44] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[45] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[46] Shengbo Eben Li,et al. Generalized Policy Iteration for Optimal Control in Continuous Time , 2019, ArXiv.
[47] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[48] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[49] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[50] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[51] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[52] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[53] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[54] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[55] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[56] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[57] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[58] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[59] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[60] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[61] Piotr Gierlak,et al. Globalized Dual Heuristic Dynamic Programming in Control of Robotic Manipulator , 2016 .
[62] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[63] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[64] Zhengyu Liu,et al. Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints , 2019, ArXiv.
[65] Huaguang Zhang,et al. An Overview of Research on Adaptive Dynamic Programming , 2013, Acta Automatica Sinica.
[66] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[67] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .