Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[3] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[4] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[6] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[8] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[9] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[10] L. Baird,et al. A MATHEMATICAL ANALYSIS OF ACTOR-CRITIC ARCHITECTURES FOR LEARNING OPTIMAL CONTROLS THROUGH INCREMENTAL DYNAMIC PROGRAMMING (cid:3) , 1990 .
[11] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[12] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[13] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[14] Andrew W. Moore,et al. Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .
[15] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[16] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..