Proximal Gradient Temporal Difference Learning Algorithms
暂无分享,去创建一个
Marek Petrik | Bo Liu | Ji Liu | Sridhar Mahadevan | Mohammad Ghavamzadeh | M. Ghavamzadeh | S. Mahadevan | Ji Liu | Bo Liu | Marek Petrik
[1] Stephen J. Wright,et al. Optimization for Machine Learning , 2013 .
[2] Yunmei Chen,et al. Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..
[3] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[4] Marek Petrik,et al. Finite-Sample Analysis of Proximal Gradient TD Algorithms , 2015, UAI.
[5] Ali H. Sayed,et al. Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.
[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[7] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[8] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[9] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.
[10] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[11] Csaba Szepesvári,et al. Statistical linear estimation with penalized estimators: an application to reinforcement learning , 2012, ICML.
[12] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.
[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[14] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[15] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[16] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Bo Liu,et al. Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces , 2014, ArXiv.
[19] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[22] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[23] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .