暂无分享,去创建一个
[1] Dale Schuurmans,et al. Direct value-approximation for factored MDPs , 2001, NIPS.
[2] A. Juditsky,et al. 5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .
[3] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[4] Gergely Neu,et al. Faster saddle-point optimization for solving large-scale Markov decision processes , 2020, L4DC.
[5] Randy Cogill,et al. Primal-dual algorithms for discounted Markov decision processes , 2015, 2015 European Control Conference (ECC).
[6] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[7] D. J. White,et al. A Survey of Applications of Markov Decision Processes , 1993 .
[8] Yasin Abbasi-Yadkori,et al. Optimizing over a Restricted Policy Class in MDPs , 2019, AISTATS.
[9] Richard J. Boucherie,et al. Markov decision processes in practice , 2017 .
[10] Tao Wang,et al. Stable Dual Dynamic Programming , 2007, NIPS.
[11] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[12] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[13] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[14] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[15] Vivek F. Farias,et al. A Smoothed Approximate Linear Program , 2009, NIPS.
[16] John Rust. Numerical dynamic programming in economics , 1996 .
[17] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[18] Mykel J. Kochenderfer,et al. Limiting Extrapolation in Linear Approximate Value Iteration , 2019, NeurIPS.
[19] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[20] Benjamin Van Roy,et al. On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..
[21] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[22] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[23] Xi Chen,et al. Large-Scale Markov Decision Problems via the Linear Programming Dual , 2019, ArXiv.
[24] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[25] Chi Jin. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2020 .
[26] Jean-Paul Chilès,et al. Wiley Series in Probability and Statistics , 2012 .
[27] Peter L. Bartlett,et al. Linear Programming for Large-Scale Markov Decision Problems , 2014, ICML.
[28] Mengdi Wang,et al. Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning , 2016, ArXiv.
[29] Zheng Wen,et al. Efficient Exploration and Value Function Generalization in Deterministic Systems , 2013, NIPS.
[30] U. Rieder,et al. Markov Decision Processes , 2010 .
[31] A. Juditsky. 6 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , II : Utilizing Problem ’ s Structure , 2010 .
[32] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[33] Yao-Liang Yu. The Strong Convexity of von Neumann’s Entropy , 2015 .
[34] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[35] Shalabh Bhatnagar,et al. A Linearly Relaxed Approximate Linear Program for Markov Decision Processes , 2017, IEEE Transactions on Automatic Control.
[36] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[37] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[38] Marek Petrik,et al. Constraint relaxation in approximate linear programs , 2009, ICML '09.
[39] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.
[40] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[41] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[42] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[43] Vivek F. Farias,et al. Non-parametric Approximate Dynamic Programming via the Kernel Method , 2012, NIPS.
[44] John N. Tsitsiklis,et al. A survey of computational complexity results in systems and control , 2000, Autom..
[45] Carmel Domshlak,et al. Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..
[46] Lihong Li,et al. Scalable Bilinear π Learning Using State and Action Features , 2018, ICML 2018.
[47] Ruosong Wang,et al. Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle , 2019, NeurIPS.