Consistent On-Line Off-Policy Evaluation
暂无分享,去创建一个
[1] Shimon Whiteson,et al. Automatic Feature Selection for Model-Based Reinforcement Learning in Factored MDPs , 2009, 2009 International Conference on Machine Learning and Applications.
[2] Shie Mannor,et al. Graying the black box: Understanding DQNs , 2016, ICML.
[3] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[6] Ding Wang,et al. Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey , 2015, International Journal of Automation and Computing.
[7] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[8] Enda Barrett,et al. Applying reinforcement learning towards automating resource allocation and application scalability in the cloud , 2013, Concurr. Comput. Pract. Exp..
[9] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[10] Takafumi Kanamori,et al. A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..
[11] Jian Wang,et al. A novel approach for constructing basis functions in approximate dynamic programming for feedback control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[12] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[13] Martha White,et al. Incremental Truncated LSTD , 2015, IJCAI.
[14] Scott Niekum,et al. Policy Evaluation Using the \Omega -Return , 2015, NIPS 2015.
[15] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[16] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[20] Kok-Lim Alvin Yau,et al. Application of reinforcement learning to routing in distributed wireless networks: a review , 2013, Artificial Intelligence Review.
[21] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[22] Yi Sun,et al. Incremental Basis Construction from Temporal Difference Error , 2011, ICML.
[23] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[24] Zhiwei Qin,et al. Sparse Reinforcement Learning via Convex Optimization , 2014, ICML.
[25] Yunmei Chen,et al. Projection Onto A Simplex , 2011, 1101.6081.
[26] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[27] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[28] Moshe Tennenholtz,et al. Encouraging Physical Activity in Patients With Diabetes Through Automatic Personalized Feedback via Reinforcement Learning Improves Glycemic Control , 2016, Diabetes Care.
[29] Vivek S. Borkar,et al. Feature Search in the Grassmanian in Online Reinforcement Learning , 2013, IEEE Journal of Selected Topics in Signal Processing.
[30] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[31] Sridhar Mahadevan,et al. Samuel Meets Amarel: Automating Value Function Approximation Using Global State Space Analysis , 2005, AAAI.
[32] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[33] Arash Givchi,et al. Off-policy temporal difference learning with distribution adaptation in fast mixing chains , 2018, Soft Comput..
[34] Masashi Sugiyama,et al. Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information , 2010, ECML/PKDD.
[35] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[36] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[37] Dean Stephen Wookey. Representation discovery using a fixed basis in reinforcement learning , 2016 .
[38] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[39] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[40] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[41] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[42] George D. Konidaris,et al. Regularized feature selection in reinforcement learning , 2015, Machine Learning.
[43] Frank L. Lewis,et al. Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning , 2013 .
[44] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[45] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[46] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[47] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[48] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[49] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[50] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[51] Shie Mannor,et al. Adaptive Bases for Reinforcement Learning , 2010, ECML/PKDD.
[52] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[53] Martha White,et al. Investigating Practical Linear Temporal Difference Learning , 2016, AAMAS.
[54] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[55] Philip S. Thomas,et al. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees , 2015, IJCAI.
[56] Scott Niekum,et al. TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning , 2011, NIPS.
[57] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[58] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[59] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[60] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[61] William D. Smart. Explicit Manifold Representations for Value-Function Approximation in Reinforcement Learning , 2004, ISAIM.
[62] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[63] Philippe Preux,et al. Basis Expansion in Natural Actor Critic Methods , 2008, EWRL.
[64] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[65] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.
[66] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[67] Masashi Sugiyama,et al. Importance-weighted least-squares probabilistic classifier for covariate shift adaptation with application to human activity recognition , 2012, Neurocomputing.
[68] Klaus Obermayer,et al. Construction of approximation spaces for reinforcement learning , 2013, J. Mach. Learn. Res..
[69] Sridhar Mahadevan,et al. Constructing basis functions from directed graphs for value function approximation , 2007, ICML '07.
[70] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.