Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
暂无分享,去创建一个
[1] R. Bellman,et al. FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .
[2] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[3] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[4] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[9] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[10] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[11] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[12] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[13] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[14] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[15] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.
[16] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[17] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[18] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[19] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[20] Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..
[21] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[22] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[23] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[24] Hédy Attouch,et al. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..
[25] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[26] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[27] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[28] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[29] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[30] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[31] Matthieu Geist,et al. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search , 2014, ECML/PKDD.
[32] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[33] Bruno Scherrer,et al. Approximate Policy Iteration Schemes: A Comparison , 2014, ICML.
[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[35] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[36] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[37] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[38] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[39] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[40] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[41] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[42] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[43] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[44] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[45] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[46] Prateek Jain,et al. A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) , 2017, FSTTCS.
[47] Nan Jiang,et al. On Oracle-Efficient PAC RL with Rich Observations , 2018, NeurIPS.
[48] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[49] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.
[50] Qi Cai,et al. Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy , 2019, ArXiv.
[51] Peter L. Bartlett,et al. POLITEX: Regret Bounds for Policy Iteration using Expert Prediction , 2019, ICML.
[52] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[53] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[54] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.
[55] Jalaj Bhandari,et al. Global Optimality Guarantees For Policy Gradient Methods , 2019, ArXiv.
[56] J. Lee,et al. Neural Temporal-Difference Learning Converges to Global Optima , 2019, NeurIPS.