BelMan: An Information-Geometric Approach to Stochastic Bandits
暂无分享,去创建一个
[1] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[2] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[5] Pierre Senellart,et al. Adaptive Web Crawling Through Structure-Based Link Classification , 2015, ICADL.
[6] L. Wasserman,et al. Rates of convergence of posterior distributions , 2001 .
[7] E. Kaufmann. On Bayesian index policies for sequential resource allocation , 2016, 1601.01190.
[8] R. Durrett. Probability: Theory and Examples , 1993 .
[9] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .
[10] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[11] Edwin T. Jaynes. Prior Probabilities , 2010, Encyclopedia of Machine Learning.
[12] Theja Tulabandhula,et al. Pure Exploration in Episodic Fixed-Horizon Markov Decision Processes , 2017, AAMAS.
[13] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[14] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[15] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[16] M. Degroot. Optimal Statistical Decisions , 1970 .
[17] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[18] Daniel Russo,et al. Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.
[19] W. Wong,et al. Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .
[20] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[21] Sanjay Shakkottai,et al. Regret of Queueing Bandits , 2016, NIPS.
[22] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[23] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[24] Solomon Kullback,et al. Information Theory and Statistics , 1960 .
[25] David H. Wolpert,et al. Bandit problems and the exploration/exploitation tradeoff , 1998, IEEE Trans. Evol. Comput..
[26] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[27] L. Brown. Fundamentals of statistical exponential families: with applications in statistical decision theory , 1986 .
[28] Guillaume Carlier,et al. Barycenters in the Wasserstein Space , 2011, SIAM J. Math. Anal..
[29] Tor Lattimore,et al. On Explore-Then-Commit strategies , 2016, NIPS.
[30] I. Csiszár. Sanov Property, Generalized $I$-Projection and a Conditional Limit Theorem , 1984 .
[31] F. Barbaresco. Information Geometry of Covariance Matrix: Cartan-Siegel Homogeneous Bounded Domains, Mostow/Berger Fibration and Fréchet Median , 2013 .
[32] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[33] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[34] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .