A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms
暂无分享,去创建一个
[1] Stephen Grossberg,et al. Embedding fields: A theory of learning with physiological implications , 1969 .
[2] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[3] Michel Installe,et al. Stochastic approximation methods , 1978 .
[4] Stef Tijs,et al. Fictitious play applied to sequences of games and discounted stochastic games , 1982 .
[5] Paul J. Schweitzer,et al. Aggregation Methods for Large Markov Chains , 1983, Computer Performance and Reliability.
[6] G. Owen,et al. Game Theory (2nd Ed.). , 1983 .
[7] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[8] M. Kurano. LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES , 1987 .
[9] C. Watkins. Learning from delayed rewards , 1989 .
[10] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[11] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[12] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[13] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[14] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[15] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[16] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[17] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[18] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[19] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[20] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[21] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[22] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[23] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[24] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[25] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[26] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[27] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[28] Carlos H. C. Ribeiro. Attentional Mechanisms as a Strategy for Generalization in the Q-Learning Algorithm , 1995 .
[29] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[30] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[31] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[32] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[33] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[34] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[35] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[36] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[37] Leon A. Petrosyan,et al. Game Theory (Second Edition) , 1996 .
[38] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .
[39] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[40] Csaba Szepesvári,et al. The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.
[41] B. Kermanshahi,et al. Multiagent reinforcement learning , 1998 .
[42] Csaba Szepesvari. Static and Dynamic Aspects of Optimal Sequential Decision Making , 1998 .
[43] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[44] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[45] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[46] Tamer Basar,et al. Analysis of Recursive Stochastic Algorithms , 2001 .