暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] J. Gillis,et al. Matrix Iterative Analysis , 1961 .
[3] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[4] A. Klopf. A neuronal model of classical conditioning , 1988 .
[5] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[6] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[7] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[8] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[9] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[10] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[11] Geoffrey J. Gordon. Stable Fitted Reinforcement Learning , 1995, NIPS.
[12] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[13] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[14] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[15] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[16] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[17] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[18] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[19] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[20] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[21] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[22] Roland E. Suri,et al. TD models of reward predictive responses in dopamine neurons , 2002, Neural Networks.
[23] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[24] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[25] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[26] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[27] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[28] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[29] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[31] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[32] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[33] Y. Niv,et al. Dialogues on prediction errors , 2008, Trends in Cognitive Sciences.
[34] P. Dayan,et al. Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.
[35] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[36] R. Sutton. The Grand Challenge of Predictive Empirical Abstract Knowledge , 2009 .
[37] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[38] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[39] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[40] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[41] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[42] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[43] Richard S. Sutton,et al. Beyond Reward: The Problem of Knowledge and Data , 2011, ILP.
[44] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[45] Huizhen Yu,et al. Least Squares Temporal Difference Methods: An Analysis under General Conditions , 2012, SIAM J. Control. Optim..
[46] J. O'Doherty,et al. Beyond simple reinforcement learning: the computational neurobiology of reward‐learning and valuation , 2012, The European journal of neuroscience.
[47] Elliot A. Ludvig,et al. Evaluating the TD model of classical conditioning , 2012, Learning & behavior.
[48] Leah M Hackman,et al. Faster Gradient-TD Algorithms , 2013 .
[49] Dimitri P. Bertsekas,et al. Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems , 2014, Math. Oper. Res..
[50] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[51] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[52] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[53] Richard S. Sutton,et al. True online TD(λ) , 2014, ICML 2014.
[54] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..
[55] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[56] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[57] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[58] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[59] Richard S. Sutton,et al. True Online Emphatic TD(λ): Quick Reference and Implementation Guide , 2015, ArXiv.
[60] Huizhen Yu,et al. On Convergence of Emphatic Temporal-Difference Learning , 2015, COLT.
[61] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[62] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[63] Richard S. Sutton,et al. True Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide , 2015 .
[64] Martha White,et al. Emphatic Temporal-Difference Learning , 2015, ArXiv.
[65] Huizhen Yu,et al. Weak Convergence Properties of Constrained Emphatic Temporal-difference Learning with Constant and Slowly Diminishing Stepsize , 2015, J. Mach. Learn. Res..
[66] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.