A Generalized Reinforcement-Learning Model: Convergence and Applications
暂无分享,去创建一个
[1] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.
[2] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[3] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[4] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[5] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[6] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[7] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..
[8] C. Atkeson,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[9] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[10] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[11] Andrew G. Barto,et al. Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms , 1993, NIPS.
[12] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[13] George H. John. When the Best Move Isn't Optimal: Q-learning with Exploration , 1994, AAAI.
[14] Sebastian Thrun,et al. Learning to Play the Game of Chess , 1994, NIPS.
[15] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[17] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[18] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[19] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[20] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[21] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[22] Csaba Szepesvári,et al. General Framework for Reinforcement Learning , 1995 .
[23] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[24] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[25] Matthias Heger. The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.
[26] Csaba Szepesv Ari,et al. Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .