Learning Evaluation Functions for Large Acyclic Domains
暂无分享,去创建一个
[1] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[2] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.
[3] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[4] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[5] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[6] Michael O. Duff,et al. Q-Learning for Bandit Problems , 1995, ICML.
[7] Dimitri P. Bertsekas,et al. A Counterexample to Temporal Differences Learning , 1995, Neural Computation.
[8] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[9] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[10] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[11] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[12] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .