论文信息 - A Generalization Error for Q-Learning

A Generalization Error for Q-Learning

Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.

Susan A. Murphy | S. Murphy

[1] R. Bellman. Dynamic programming. , 1957, Science.

[2] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.

[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[7] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.

[8] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9] Douglas H. Fisher,et al. Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997 , 1997, ICML.

[10] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[11] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[12] H. Sung,et al. Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.

[13] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[14] K. Davis,et al. National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE): Alzheimer disease trial methodology. , 2001, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[15] M. Altfeld,et al. Less is more? STI in acute and chronic HIV-1 infection , 2001, Nature Medicine.

[16] H. Sung,et al. Selecting Therapeutic Strategies Based on Efficacy and Death in Multicourse Clinical Trials , 2002 .

[17] R. Brooner,et al. Using Behavioral Reinforcement To Improve Methadone Treatment Participation , 2002, Science & practice perspectives.

[18] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.

[19] Alexander L. Miller,et al. Texas Medication Algorithm Project, phase 3 (TMAP-3): rationale and study design. , 2003, The Journal of clinical psychiatry.

[20] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .

[21] D. Kupfer,et al. Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. , 2003, The Psychiatric clinics of North America.

[22] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[23] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25] M. Plotkin. Nature as medicine. , 2005, Explore.

[26] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.

[27] John N. Tsitsiklis,et al. Dynamic Catalog Mailing Policies , 2006, Manag. Sci..