Regularized Off-Policy TD-Learning
暂无分享,去创建一个
Bo Liu | Ji Liu | Sridhar Mahadevan | S. Mahadevan | Ji Liu | Bo Liu
[1] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] Jennie Si,et al. Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] Arkadi Nemirovski,et al. Non-euclidean restricted memory level method for large-scale convex optimization , 2005, Math. Program..
[6] Y. Nesterov. Gradient methods for minimizing composite objective function , 2007 .
[7] G. Saridis,et al. Journal of Optimization Theory and Applications Approximate Solutions to the Time-invariant Hamilton-jacobi-bellman Equation 1 , 1998 .
[8] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[9] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[10] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[11] Angelia Nedic,et al. Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.
[12] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[13] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[14] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[15] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[16] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[17] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[18] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[19] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[20] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[21] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.
[22] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[23] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .