A Study of Off-policy Learning in Computational Sustainability
暂无分享,去创建一个
[1] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[2] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[3] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[4] Christopher J. Fonnesbeck,et al. SOLVING DYNAMIC WILDLIFE RESOURCE OPTIMIZATION PROBLEMS USING REINFORCEMENT LEARNING , 2005 .
[5] David R. Anderson. Optimal Exploitation Strategies for an Animal Population in a Markovian Environment: A Theory and an Example , 1975 .
[6] David B. Dunson,et al. Approximate Dynamic Programming for Storage Problems , 2011, ICML.
[7] John N. Tsitsiklis,et al. Bias and Variance Approximation in Value Function Estimates , 2007, Manag. Sci..
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Joelle Pineau,et al. Treating Epilepsy via Adaptive Neurostimulation: a Reinforcement Learning Approach , 2009, Int. J. Neural Syst..
[10] E. Ziegel. Modern Mathematical Statistics , 1989 .
[11] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[12] Naoki Abe,et al. Optimizing debt collections using constrained reinforcement learning , 2010, KDD.
[13] S. Murphy,et al. An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.
[14] Reuven Y. Rubinstein,et al. Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.
[15] A. Tsiatis. Semiparametric Theory and Missing Data , 2006 .
[16] Joseph Kang,et al. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.
[17] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[18] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[19] J. Robins,et al. Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.