First-order Policy Optimization for Robust Markov Decision Process
暂无分享,去创建一个
[1] Shie Mannor,et al. Efficient Policy Iteration for Robust Markov Decision Processes via Regularization , 2022, ArXiv.
[2] Shaofeng Zou,et al. Policy Gradient Method For Robust Reinforcement Learning , 2022, ICML.
[3] Guanghui Lan,et al. Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity , 2022, ArXiv.
[4] Lin Xiao. On the Convergence Rates of Policy Gradient Methods , 2022, J. Mach. Learn. Res..
[5] D. Kalathil,et al. Sample Complexity of Robust Reinforcement Learning with a Generative Model , 2021, AISTATS.
[6] Yuxin Chen,et al. Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization , 2020, Oper. Res..
[7] Vineet Goyal,et al. A First-Order Approach to Accelerated Value Iteration , 2019, Oper. Res..
[8] Vineet Goyal,et al. Robust Markov Decision Process: Beyond Rectangularity , 2018, 1811.00215.
[9] Shie Mannor,et al. Twice regularized MDPs and the equivalence between robustness and regularization , 2021, NeurIPS.
[10] Shaofeng Zou,et al. Online Robust Reinforcement Learning with Model Uncertainty , 2021, NeurIPS.
[11] Prakirt Raj Jhunjhunwala,et al. On the Linear Convergence of Natural Policy Gradient Algorithm , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).
[12] Siva Theja Maguluri,et al. A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants , 2021, ArXiv.
[13] Guanghui Lan. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes , 2021, Mathematical Programming.
[14] Dileep Kalathil,et al. Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees , 2020, ICML.
[15] Wolfram Wiesemann,et al. Partial Policy Iteration for L1-Robust Markov Decision Processes , 2020, J. Mach. Learn. Res..
[16] Christian Kroer,et al. Scalable First-Order Methods for Robust MDPs , 2020, AAAI.
[17] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[18] Tuo Zhao,et al. Implicit Bias of Gradient Descent based Adversarial Training on Separable Data , 2020, ICLR.
[19] Yingbin Liang,et al. Improving Sample Complexity Bounds for Actor-Critic Algorithms , 2020, ArXiv.
[20] Tuo Zhao,et al. Deep Reinforcement Learning with Robust and Smooth Policy , 2020, ICML.
[21] Andrzej Ruszczynski,et al. Risk-Averse Learning by Temporal Difference Methods , 2020, ArXiv.
[22] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2019, AAAI.
[23] Guanghui Lan,et al. First-order and Stochastic Optimization Methods for Machine Learning , 2020 .
[24] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.
[25] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[26] Aurko Roy,et al. Reinforcement Learning under Model Mismatch , 2017, NIPS.
[27] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[28] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[29] Andrew J. Schaefer,et al. Robust Modified Policy Iteration , 2013, INFORMS J. Comput..
[30] Daniel Kuhn,et al. Robust Markov Decision Processes , 2013, Math. Oper. Res..
[31] Andrzej Ruszczynski,et al. Risk-averse dynamic programming for Markov decision processes , 2010, Math. Program..
[32] Shie Mannor,et al. Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.
[33] Shie Mannor,et al. Robustness and Regularization of Support Vector Machines , 2008, J. Mach. Learn. Res..
[34] D. Vittone. Introduction to Geometric Measure Theory , 2006 .
[35] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..
[36] Garud Iyengar,et al. Robust Dynamic Programming , 2005, Math. Oper. Res..
[37] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[38] A. Kruger. On Fréchet Subdifferentials , 2003 .
[39] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[40] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[41] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[42] Â. J. Vial,et al. Strong Convexity of Sets and Functions , 1982 .
[43] J. Danskin. The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .