暂无分享,去创建一个
[1] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[2] Guanghui Lan,et al. Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning , 2020, SIAM J. Optim..
[3] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[4] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[5] Yuxin Chen,et al. Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization , 2020, Oper. Res..
[6] Sham M. Kakade,et al. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..
[7] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[8] Shie Mannor,et al. Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs , 2020, AAAI.
[9] Guanghui Lan,et al. Simple and optimal methods for stochastic variational inequalities, I: operator extrapolation , 2020, SIAM J. Optim..
[10] Guanghui Lan,et al. First-order and Stochastic Optimization Methods for Machine Learning , 2020 .
[11] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[12] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Dale Schuurmans,et al. On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.
[15] Jalaj Bhandari,et al. A Note on the Linear Convergence of Policy Gradient Methods , 2020, ArXiv.
[16] Guanghui Lan,et al. On the convergence properties of non-Euclidean extragradient methods for variational inequalities with generalized monotone operators , 2013, Comput. Optim. Appl..
[17] Zhaoran Wang,et al. Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.
[18] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[20] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[21] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[22] Yingbin Liang,et al. Improving Sample Complexity Bounds for Actor-Critic Algorithms , 2020, ArXiv.