暂无分享,去创建一个
[1] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[3] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[4] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[5] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[6] Rudolf Ahlswede,et al. Maximum Number of Constant Weight Vertices of the Unit n-Cube Contained in a k-Dimensional Subspace , 2003, Comb..
[7] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[8] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[9] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[10] Branko Ristic,et al. Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .
[11] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[12] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[13] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..
[14] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.