CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits
暂无分享,去创建一个
[1] Jason L. Loeppky,et al. A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit , 2015, ArXiv.
[2] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[3] Michael L. Littman,et al. The Cross-Entropy Method Optimizes for Quantiles , 2013, ICML.
[4] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[5] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[8] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[9] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[10] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[13] David H. Ackley,et al. The effects of selection on noisy fitness optimization , 2011, GECCO '11.