Reinforcement learning and mistake bounded algorithms
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[6] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[7] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[8] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[9] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[11] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[12] Noga Alon,et al. The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.
[13] R. Bellman. Dynamic programming. , 1957, Science.
[14] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[15] Umesh V. Vazirani,et al. An Introduction to Computational Learning Theory , 1994 .
[16] Andrew G. Barto,et al. Reinforcement learning , 1998 .