Policy evaluation with temporal differences: a survey and comparison
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] M. Rosenblatt. Markov Processes, Structure and Asymptotic Behavior , 1971 .
[3] James S. Albus,et al. I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .
[4] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[5] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .
[6] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .
[7] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.
[8] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[9] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[10] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .
[11] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[13] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[14] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).
[15] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.
[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[18] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[19] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[20] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.
[21] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.
[22] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[23] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[24] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[26] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[27] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[28] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[29] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .
[30] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[31] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[32] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[33] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..
[34] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.
[35] Alborz Geramifard,et al. iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.
[36] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.
[37] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.
[38] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[39] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.
[40] H. Robbins. A Stochastic Approximation Method , 1951 .
[41] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.
[42] Martin A. Riedmiller,et al. On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[43] Shane Legg,et al. Temporal Difference Updating without a Learning Rate , 2007, NIPS.
[44] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[45] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[46] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[47] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[48] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[49] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[50] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[51] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.
[52] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .
[53] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.
[54] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.
[55] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[56] P. Zhao,et al. The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.
[57] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[58] Richard W. Cottle,et al. Linear Complementarity Problem , 2009, Encyclopedia of Optimization.
[59] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[60] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .
[61] S. Mahadevan,et al. Sparse Approximate Policy Evaluation using Graph-based Basis Functions , 2009 .
[62] Julien Mairal,et al. Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.
[63] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[64] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[65] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.
[66] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.
[67] Masashi Sugiyama,et al. Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information , 2010, ECML/PKDD.
[68] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[69] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[70] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[71] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .
[72] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.
[73] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.
[74] Lance Sherry,et al. Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures , 2010 .
[75] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.
[76] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.
[77] Matthieu Geist,et al. ℓ1-Penalized Projected Bellman Residual , 2011, EWRL.
[78] Matthieu Geist,et al. Recursive Least-Squares Learning with Eligibility Traces , 2011, EWRL.
[79] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[80] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[81] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.
[82] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[83] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.
[84] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[85] Andrew G. Barto,et al. Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.
[86] Dominik Meyer,et al. L1 Regularized Gradient Temporal-Difference Learning , 2012, EWRL 2012.
[87] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[88] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.
[89] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .
[90] B. A. Pires,et al. Statistical analysis of L1-penalized linear estimation with applications , 2012 .
[91] Alborz Geramifard,et al. Batch-iFDD for Representation Expansion in Large MDPs , 2013, UAI.
[92] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..