John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
 Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
 Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
 K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .
 Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
 John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
 Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
 Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
 Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning , 2004, Machine Learning.
 Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
 Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
 Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
 Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..
 Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
 Hédy Attouch,et al. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..
 Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
 Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
 Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
 Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
 Shai Ben-David,et al. Understanding Machine Learning - From Theory to Algorithms , 2014 .
 Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2014, IEEE Transactions on Automatic Control.
 Bruno Scherrer. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
 Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
 Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
 Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
 Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2016, Math. Program..
 Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
 Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2017, ICML.
 Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
 Prateek Jain,et al. Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification , 2016, J. Mach. Learn. Res..
 Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
 Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for Linearized Control Problems , 2018, ICML 2018.
 Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
 Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
 Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
 Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
 Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
 Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2019, ICML.
 Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
 Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.
 Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2020, COLT.