Reinforcement Learning is Direct Adaptive Optimal Control
暂无分享,去创建一个
Richard S. Sutton | Andrew G. Barto | Ronald J. Williams | R. Sutton | Ronald J. Williams | A. Barto
[1] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.
[2] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[3] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979 .
[4] P. Kumar,et al. Optimal adaptive controllers for unknown Markov chains , 1982 .
[5] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[6] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[7] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[8] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.
[10] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[11] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[12] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..
[13] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[14] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[15] Richard S. Sutton,et al. Learning and Sequential Decision Making , 1989 .
[16] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[17] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[18] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[19] Andrew G. Barto,et al. Connectionist learning for control: an overview , 1990 .
[20] Andrew G. Barto,et al. On the Computational Economics of Reinforcement Learning , 1991 .
[21] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[22] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .
[23] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..