The loss from imperfect value functions in expectation-based and minimax-based tasks
暂无分享,去创建一个
[1] Kathleen Martin,et al. The Learning Machines. , 1981 .
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[4] C. Watkins. Learning from delayed rewards , 1989 .
[5] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[6] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[7] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[8] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[9] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[10] M.A.F. Mcdonald,et al. Approximate Discounted Dynamic Programming Is Unreliable , 1994 .
[11] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[12] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..
[13] Matthias Heger,et al. Consideration of Risk in Reinforcement Learning , 1994, ICML.
[14] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[15] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[16] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[17] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[18] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[19] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[20] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[21] Andrew W. Moore,et al. The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.
[22] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[23] Hamdy A. Taha,et al. Operations Research: An Introduction (8th Edition) , 2006 .