On the Significance of Markov Decision Processes
暂无分享,去创建一个
[1] R. Bellman. A Markovian Decision Process , 1957 .
[2] Ian H. Witten,et al. Exploring, Modelling and Controlling Discrete Sequential Environments , 1977, Int. J. Man Mach. Stud..
[3] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[4] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[5] M. Gabriel,et al. Learning and Computational Neuroscience: Foundations of Adaptive Networks , 1990 .
[6] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[7] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[8] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[9] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[10] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[11] Alan Bundy,et al. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .
[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[13] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[14] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[15] Joel L. Davis,et al. In : Models of Information Processing in the Basal Ganglia , 2008 .
[16] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[17] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[18] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[19] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.
[20] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[21] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[22] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[23] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[24] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.