暂无分享,去创建一个
[1] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[5] O. François,et al. Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.
[6] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[7] David Welch,et al. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.
[8] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[9] Jean-Michel Marin,et al. Approximate Bayesian computational methods , 2011, Statistics and Computing.
[10] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[11] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[12] Christos Dimitrakakis,et al. Robust Bayesian Reinforcement Learning through Tight Lower Bounds , 2011, EWRL.
[13] Thomas A. Dean,et al. Asymptotic behaviour of approximate Bayesian estimators , 2011, 1105.3655.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[16] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[17] Sumeetpal S. Singh,et al. Filtering via approximate Bayesian computation , 2010, Statistics and Computing.
[18] Rémi Munos,et al. Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.
[19] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[20] D. Bertsekas. Rollout Algorithms for Constrained Dynamic Programming , 2005 .
[21] Sumeetpal S. Singh,et al. ASYMPTOTIC BEHAVIOUR OF APPROXIMATE , 2011 .
[22] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[23] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[24] Feng Wu,et al. Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.
[25] M. Degroot. Optimal Statistical Decisions , 1970 .
[26] John Geweke,et al. Federal Reserve Bank of Minneapolis Research Department Staff Report 249 Using Simulation Methods for Bayesian Econometric Models: Inference, Development, and Communication , 2022 .
[27] Doina Precup,et al. Using Linear Programming for Bayesian Exploration in Markov Decision Processes , 2007, IJCAI.
[28] Olivier Buffet,et al. Near-Optimal BRL using Optimistic Local Transitions , 2012, ICML.
[29] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[30] Christos Dimitrakakis,et al. Beliefbox: A framework for statistical methods in sequential decision making , 2007 .
[31] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[32] D. Bertsekas. Rollout Algorithms for Constrained Dynamic Programming 1 , 2005 .
[33] L. J. Savage,et al. The Foundations of Statistics , 1955 .
[34] Cynthia Dwork,et al. Differential privacy and robust statistics , 2009, STOC '09.
[35] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[36] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[37] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[38] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[39] I. Pinelis. On inequalities for sums of bounded random variables , 2006, math/0603030.
[40] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .