A Generalization Error for Q-Learning
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[4] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .
[5] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[6] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[7] Claude-Nicolas Fiechter. Expected Mistake Bound Model for On-Line Reinforcement Learning , 1997, ICML.
[8] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[9] Douglas H. Fisher,et al. Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997 , 1997, ICML.
[10] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[11] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[12] H. Sung,et al. Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.
[13] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[14] K. Davis,et al. National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE): Alzheimer disease trial methodology. , 2001, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.
[15] M. Altfeld,et al. Less is more? STI in acute and chronic HIV-1 infection , 2001, Nature Medicine.
[16] H. Sung,et al. Selecting Therapeutic Strategies Based on Efficacy and Death in Multicourse Clinical Trials , 2002 .
[17] R. Brooner,et al. Using Behavioral Reinforcement To Improve Methadone Treatment Participation , 2002, Science & practice perspectives.
[18] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[19] Alexander L. Miller,et al. Texas Medication Algorithm Project, phase 3 (TMAP-3): rationale and study design. , 2003, The Journal of clinical psychiatry.
[20] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[21] D. Kupfer,et al. Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. , 2003, The Psychiatric clinics of North America.
[22] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[23] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[24] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.
[25] M. Plotkin. Nature as medicine. , 2005, Explore.
[26] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[27] John N. Tsitsiklis,et al. Dynamic Catalog Mailing Policies , 2006, Manag. Sci..