暂无分享,去创建一个
Bo Liu | Ji Liu | Sridhar Mahadevan | Mohammad Ghavamzadeh | Marek Petrik | M. Ghavamzadeh | S. Mahadevan | Ji Liu | Bo Liu | Marek Petrik
[1] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[2] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[3] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[6] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[7] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[10] O. Devolder,et al. Stochastic first order methods in smooth convex optimization , 2011 .
[11] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[12] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[13] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[14] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..
[15] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[16] Rémi Munos,et al. Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control , 2013, ECML/PKDD.
[17] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] Bruno Scherrer,et al. Rate of Convergence and Error Bounds for LSTD(λ) , 2014, ICML 2015.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[22] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[23] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[24] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[26] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[27] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[28] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[29] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[30] Michael I. Jordan,et al. Ergodic mirror descent , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[31] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[32] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[33] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[34] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .