Direct and indirect reinforcement learning
暂无分享,去创建一个
Jingliang Duan | Shengbo Eben Li | Yang Guan | Yangang Ren | Jie Li | Bo Cheng | Yang Guan | S. Li | Jingliang Duan | Jie Li | Yangang Ren | B. Cheng
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] R. Bellman. Dynamic Programming , 1957, Science.
[3] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[4] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[5] Donald A. Sofge,et al. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .
[6] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[10] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[11] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[12] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[13] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[16] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[17] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[20] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[21] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[22] Warren B. Powell,et al. “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.
[23] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[24] Huaguang Zhang,et al. An Overview of Research on Adaptive Dynamic Programming , 2013, Acta Automatica Sinica.
[25] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[26] Derong Liu,et al. Action dependent heuristic dynamic programming for home energy resource scheduling , 2013 .
[27] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[28] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[29] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[30] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[31] Haibo He,et al. Model-Free Dual Heuristic Dynamic Programming , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[32] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[33] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[34] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[35] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.
[36] Piotr Gierlak,et al. Globalized Dual Heuristic Dynamic Programming in Control of Robotic Manipulator , 2016 .
[37] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[38] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[39] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[40] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[41] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[42] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[43] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[44] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[45] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[46] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[47] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[48] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[49] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[50] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[51] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[52] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[53] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[54] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[55] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[56] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[57] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[58] Honglak Lee,et al. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion , 2018, NeurIPS.
[59] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[60] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.
[61] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.
[62] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..
[63] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.
[64] Benjamin Recht,et al. Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.
[65] Erik Talvitie,et al. The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces , 2018, ArXiv.
[66] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[67] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[68] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[69] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[70] Sergey Levine,et al. Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning , 2018, ArXiv.
[71] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[72] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[73] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.
[74] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.
[75] Shengbo Eben Li,et al. Generalized Policy Iteration for Optimal Control in Continuous Time , 2019, ArXiv.
[76] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[77] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[78] Sergey Levine,et al. When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.
[79] Jimmy Ba,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.
[80] Shengbo Eben Li,et al. Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function , 2020, ArXiv.
[81] P. Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.
[82] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.
[83] Tsuyoshi Murata,et al. {m , 1934, ACML.
[84] P. Alam,et al. H , 1887, High Explosives, Propellants, Pyrotechnics.
[85] Zhengyu Liu,et al. Deep adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints , 2019, ArXiv.
[86] P. Alam. ‘N’ , 2021, Composites Engineering: An A–Z Guide.
[87] P. Alam. ‘E’ , 2021, Composites Engineering: An A–Z Guide.