Ballooning Multi-Armed Bandits
暂无分享,去创建一个
[1] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[2] Wei Tang,et al. Bandit Learning with Biased Human Feedback , 2019, AAMAS.
[3] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[4] Robert W. Chen,et al. Bandit problems with infinitely many arms , 1997 .
[5] W. Hoeffding. On the Distribution of the Number of Successes in Independent Trials , 1956 .
[6] Yang Liu,et al. Incentivizing High Quality User Contributions: New Arm Generation in Bandit Learning , 2018, AAAI.
[7] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[8] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[9] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[10] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[11] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[12] Lilian Besson,et al. What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[15] Y. Narahari,et al. Analysis of Thompson Sampling for Stochastic Sleeping Bandits , 2017, UAI.
[16] P. Whittle. Arm-Acquiring Bandits , 1981 .
[17] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[18] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[19] Moshe Babaioff,et al. Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.
[20] Aurélien Garivier,et al. A minimax and asymptotically optimal algorithm for stochastic bandits , 2017, ALT.
[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[22] Moshe Babaioff,et al. Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2009, EC '09.
[23] Sujit Gujar,et al. A quality assuring, cost optimal multi-armed bandit mechanism for expertsourcing , 2018, Artif. Intell..
[24] Gaston H. Gonnet,et al. On the LambertW function , 1996, Adv. Comput. Math..
[25] Patrick Hummel,et al. Learning and incentives in user-generated content: multi-armed bandits with endogenous arms , 2013, ITCS '13.
[26] Setareh Maghsudi,et al. Joint Channel Selection and Power Control in Infrastructureless Wireless Networks: A Multiplayer Multiarmed Bandit Framework , 2014, IEEE Transactions on Vehicular Technology.
[27] Jure Leskovec,et al. Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.
[28] Jack Bowden,et al. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.
[29] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.
[30] Kristina Lerman,et al. The myopia of crowds: Cognitive load and collective evaluation of answers on Stack Exchange , 2016, PloS one.
[31] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[32] Michal Valko,et al. Simple regret for infinitely many armed bandits , 2015, ICML.
[33] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[34] A. Hoorfar,et al. INEQUALITIES ON THE LAMBERTW FUNCTION AND HYPERPOWER FUNCTION , 2008 .
[35] Aurélien Garivier,et al. KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints , 2018, J. Mach. Learn. Res..
[36] R. Devanand,et al. Empirical study of Thompson sampling: Tuning the posterior parameters , 2017 .
[37] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.
[38] Vianney Perchet,et al. Anytime optimal algorithms in stochastic multi-armed bandits , 2016, ICML.