暂无分享,去创建一个
[1] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] L. Wasserman,et al. Rates of convergence of posterior distributions , 2001 .
[4] E. Kaufmann. On Bayesian index policies for sequential resource allocation , 2016, 1601.01190.
[5] M. Degroot. Optimal Statistical Decisions , 1970 .
[6] Guillaume Carlier,et al. Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..
[7] Tze Leung Lai,et al. Asymptotic Solutions of Bandit Problems , 1988 .
[8] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[9] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[10] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[11] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[12] R. Durrett. Probability: Theory and Examples , 1993 .
[13] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[14] Pierre Senellart,et al. Adaptive Web Crawling Through Structure-Based Link Classification , 2015, ICADL.
[15] S. Kullback,et al. Information Theory and Statistics , 1959 .
[16] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[17] Edwin T. Jaynes. Prior Probabilities , 2010, Encyclopedia of Machine Learning.
[18] Theja Tulabandhula,et al. Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes , 2017, AAMAS.
[19] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[20] L. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .
[21] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..
[22] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[23] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[24] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[25] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[26] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[27] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[28] J. Bernardo,et al. Psi (Digamma) Function , 1976 .
[29] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[30] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[31] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[32] Sanjay Shakkottai,et al. Regret of Queueing Bandits , 2016, NIPS.
[33] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.
[34] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.
[35] I. Csiszár. Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .
[36] Chien-Ju Ho,et al. Online Task Assignment in Crowdsourcing Markets , 2012, AAAI.
[37] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[38] Daniel Russo,et al. Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.
[39] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[40] B. O. Koopman. On distributions admitting a sufficient statistic , 1936 .
[41] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[42] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[43] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[44] David H. Wolpert,et al. Bandit problems and the exploration/exploitation tradeoff , 1998, IEEE Trans. Evol. Comput..
[45] W. Wong,et al. Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .
[46] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.