暂无分享,去创建一个
Alec Koppel | Mengdi Wang | Amrit Singh Bedi | Csaba Szepesvari | Junyu Zhang | Csaba Szepesvari | Mengdi Wang | Alec Koppel | A. S. Bedi | Junyu Zhang
[1] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[2] Peter W. Glynn,et al. Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning , 2019, ICML.
[3] E. Altman. Constrained Markov Decision Processes , 1999 .
[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[5] Mehran Mesbahi,et al. LQR through the Lens of First Order Methods: Discrete-time Case , 2019, ArXiv.
[6] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[7] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[8] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[9] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[10] Ying Huang,et al. On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs , 1994, Math. Oper. Res..
[11] Ambuj Tewari,et al. Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..
[12] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[15] Luca Bascetta,et al. Adaptive Step-Size for Policy Gradient Methods , 2013, NIPS.
[16] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[17] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[18] John N. Tsitsiklis,et al. Mean-Variance Optimization in Markov Decision Processes , 2011, ICML.
[19] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[20] B. V. Dean,et al. Studies in Linear and Non-Linear Programming. , 1959 .
[21] Long Ji Lin,et al. Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..
[22] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[23] Yaoliang Yu,et al. A General Projection Property for Distribution Families , 2009, NIPS.
[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[25] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[26] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[27] Quanquan Gu,et al. Sample Efficient Policy Gradient Methods with Recursive Variance Reduction , 2020, ICLR.
[28] Dmitriy Drusvyatskiy,et al. Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..
[29] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[30] Alec Koppel,et al. Cautious Reinforcement Learning via Distributional Risk in the Dual Domain , 2020, IEEE Journal on Selected Areas in Information Theory.
[31] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[32] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[33] Marcello Restelli,et al. Stochastic Variance-Reduced Policy Gradient , 2018, ICML.
[34] Mengdi Wang,et al. Generalization Bounds for Stochastic Saddle Point Problems , 2020, AISTATS.
[35] Bo Dai,et al. Reinforcement Learning via Fenchel-Rockafellar Duality , 2020, ArXiv.
[36] H. Robbins. A Stochastic Approximation Method , 1951 .
[37] Hao Zhu,et al. Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies , 2019, SIAM J. Control. Optim..
[38] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[39] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.
[40] Alexander Shapiro,et al. Lectures on Stochastic Programming: Modeling and Theory , 2009 .
[41] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[42] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[43] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[44] Lodewijk C. M. Kallenberg,et al. Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory , 1994, Math. Methods Oper. Res..
[45] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[46] John S. Edwards,et al. Linear Programming and Finite Markovian Control Problems , 1983 .
[47] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.
[48] Sean P. Meyn,et al. Risk-Sensitive Optimal Control for Markov Decision Processes with Monotone Cost , 2002, Math. Oper. Res..
[49] L. Takács,et al. Non-Markovian Processes , 1966 .
[50] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[51] C. Derman,et al. Some Remarks on Finite Horizon Markovian Decision Models , 1965 .
[52] Luca Bascetta,et al. Policy gradient in Lipschitz Markov Decision Processes , 2015, Machine Learning.
[53] Jerzy A. Filar,et al. Variance-Penalized Markov Decision Processes , 1989, Math. Oper. Res..
[54] Sham M. Kakade,et al. Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.