暂无分享,去创建一个
[1] G. Pflug. Stochastic Approximation Methods for Constrained and Unconstrained Systems - Kushner, HJ.; Clark, D.S. , 1980 .
[2] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[3] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[4] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[5] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[6] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[7] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[8] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[9] L. A. Prashanth,et al. Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games , 2014 .
[10] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[11] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[12] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[13] Tamer Basar,et al. Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.
[14] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[15] M. T. Wasan. Stochastic Approximation , 1969 .
[16] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[17] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[18] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[19] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[20] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[21] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[22] V. Borkar,et al. Stochastic approximation , 2013, Resonance.