论文信息 - Policy evaluation with temporal differences: a survey and comparison

Policy evaluation with temporal differences: a survey and comparison

Extended abstract of the article: Christoph Dann, Gerhard Neumann, Jan Peters (2014) Policy Evaluation with Temporal Differences: A Survey and Comparison Journal of Machine Learning Research, 15, 809-883.

[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2] M. Rosenblatt. Markov Processes, Structure and Asymptotic Behavior , 1971 .

[3] James S. Albus,et al. I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .

[4] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .

[5] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[6] Donald L. Iglehart,et al. Importance sampling for stochastic simulations , 1989 .

[7] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[8] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[9] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[10] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .

[11] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[12] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[14] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[15] Doina Precup,et al. Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[16] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[18] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.

[19] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[20] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[21] Shie Mannor,et al. Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning , 2003, ICML.

[22] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..

[23] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[24] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[25] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[26] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[27] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[28] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[29] Yaakov Engel,et al. Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[30] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[31] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[32] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[33] David Choi,et al. A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..

[34] Shie Mannor,et al. Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[35] Alborz Geramifard,et al. iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.

[36] Csaba Szepesvári,et al. Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path , 2006, COLT.

[37] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.

[38] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .

[39] Daniel Polani,et al. Least Squares SVM for Least Squares TD Learning , 2006, ECAI.

[40] H. Robbins. A Stochastic Approximation Method , 1951 .

[41] Terence Tao,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[42] Martin A. Riedmiller,et al. On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[43] Shane Legg,et al. Temporal Difference Updating without a Learning Rate , 2007, NIPS.

[44] M. Loth,et al. Sparse Temporal Difference Learning Using LASSO , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[45] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.

[46] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.

[47] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[48] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[49] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[50] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[51] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.

[52] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[53] Shie Mannor,et al. Reinforcement learning in the presence of rare events , 2008, ICML '08.

[54] Shie Mannor,et al. Regularized Policy Iteration , 2008, NIPS.

[55] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.

[56] P. Zhao,et al. The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[57] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[58] Richard W. Cottle,et al. Linear Complementarity Problem , 2009, Encyclopedia of Optimization.

[59] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[60] D. Bertsekas,et al. Journal of Computational and Applied Mathematics Projected Equation Methods for Approximate Solution of Large Linear Systems , 2022 .

[61] S. Mahadevan,et al. Sparse Approximate Policy Evaluation using Graph-based Basis Functions , 2009 .

[62] Julien Mairal,et al. Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[63] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.

[64] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.

[65] Csaba Szepesvári,et al. Model Selection in Reinforcement Learning , 2011, Machine Learning.

[66] Ronald Parr,et al. Linear Complementarity for Regularized Policy Evaluation and Improvement , 2010, NIPS.

[67] Masashi Sugiyama,et al. Feature Selection for Reinforcement Learning: Evaluating Implicit State-Reward Dependency via Conditional Mutual Information , 2010, ECML/PKDD.

[68] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.

[69] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..

[70] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.

[71] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .

[72] Alessandro Lazaric,et al. LSTD with Random Projections , 2010, NIPS.

[73] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.

[74] Lance Sherry,et al. Accuracy of reinforcement learning algorithms for predicting aircraft taxi-out times: A case-study of Tampa Bay departures , 2010 .

[75] Matthew W. Hoffman,et al. Finite-Sample Analysis of Lasso-TD , 2011, ICML.

[76] Matthew W. Hoffman,et al. Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization , 2011, EWRL.

[77] Matthieu Geist,et al. ℓ1-Penalized Projected Bellman Residual , 2011, EWRL.

[78] Matthieu Geist,et al. Recursive Least-Squares Learning with Eligibility Traces , 2011, EWRL.

[79] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[80] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[81] Alborz Geramifard,et al. Online Discovery of Feature Dependencies , 2011, ICML.

[82] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .

[83] Matthieu Geist,et al. A Dantzig Selector Approach to Temporal Difference Learning , 2012, ICML.

[84] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85] Andrew G. Barto,et al. Adaptive Step-Size for Online Temporal Difference Learning , 2012, AAAI.

[86] Dominik Meyer,et al. L1 Regularized Gradient Temporal-Difference Learning , 2012, EWRL 2012.

[87] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.

[88] Ronald Parr,et al. Greedy Algorithms for Sparse Reinforcement Learning , 2012, ICML.

[89] Ronald E. Parr,et al. L1 Regularized Linear Temporal Difference Learning , 2012 .

[90] B. A. Pires,et al. Statistical analysis of L1-penalized linear estimation with applications , 2012 .

[91] Alborz Geramifard,et al. Batch-iFDD for Representation Expansion in Large MDPs , 2013, UAI.

[92] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..