PAC optimal MDP planning with application to invasive species management
暂无分享,去创建一个
Thomas G. Dietterich | Mark Crowley | Majid Alkaee Taleghan | Kim Hall | H. Jo Albers | K. Hall | Mark Crowley | H. Albers
[1] Shie Mannor,et al. Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty , 2012, ICML.
[2] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[3] Thomas G. Dietterich,et al. Allowing a wildfire to burn: estimating the effect on future fire suppression costs , 2013 .
[4] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] H. Chernoff. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .
[7] Reid G. Simmons,et al. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.
[8] Lawrence K. Saul,et al. Large Deviation Methods for Approximate Probabilistic Inference , 1998, UAI.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Thomas G. Dietterich,et al. PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs , 2013, AAAI.
[11] R. Khan,et al. Sequential Tests of Statistical Hypotheses. , 1972 .
[12] Andrea Rinaldo,et al. On biodiversity in river networks: A trade‐off metapopulation model and comparative analysis , 2007 .
[13] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[14] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[15] David A. McAllester,et al. On the Convergence Rate of Good-Turing Estimators , 2000, COLT.
[16] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[17] Lucian Busoniu,et al. Optimistic planning for Markov decision processes , 2012, AISTATS.
[18] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[19] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[20] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[21] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[22] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[23] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[24] Shie Mannor,et al. Scaling Up Robust MDPs using Function Approximation , 2014, ICML.
[25] Claude-Nicolas Fiechter,et al. Efficient reinforcement learning , 1994, COLT '94.
[26] Luis E. Ortiz,et al. Concentration Inequalities for the Missing Mass and for Histogram Rule Error , 2003, J. Mach. Learn. Res..
[27] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[28] Michael L. Littman,et al. An empirical evaluation of interval estimation for Markov decision processes , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.
[29] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[30] Claude-Nicolas Fiechter,et al. Design and analysis of efficient reinforcement learning algorithms , 1997 .
[31] Paul Valiant,et al. Estimating the Unseen , 2013, NIPS.
[32] Alon Orlitsky,et al. Always Good Turing: Asymptotically Optimal Probability Estimation , 2003, Science.
[33] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .