Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs
暂无分享,去创建一个
[1] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.
[2] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[3] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[4] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[5] Marcello Restelli,et al. Smoothing policies and safe policy gradients , 2019, Machine Learning.
[6] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[7] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[8] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[9] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[10] Ofir Nachum,et al. Path Consistency Learning in Tsallis Entropy Regularized MDPs , 2018, ICML.
[11] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[12] A. Juditsky,et al. 5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .
[13] S. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.
[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[15] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[16] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[17] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Angelia Nedic,et al. On Stochastic Subgradient Mirror-Descent Algorithm with Weighted Averaging , 2013, SIAM J. Optim..
[20] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[21] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[22] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[23] Dale Schuurmans,et al. Trust-PCL: An Off-Policy Trust Region Method for Continuous Control , 2017, ICLR.
[24] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[25] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[26] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[27] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[28] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[30] P. Alam. ‘A’ , 2021, Composites Engineering: An A–Z Guide.
[31] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[32] P. Alam. ‘S’ , 2021, Composites Engineering: An A–Z Guide.