Information Complexity in Bandit Subset Selection
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[3] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[4] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[5] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Andrew W. Moore,et al. The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.
[8] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[9] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[10] Peter Stone,et al. Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.
[11] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[12] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[13] Shivaram Kalyanakrishnan. Learning Methods for Sequential Decision Making with Imperfect Representations by Shivaram Kalyanakrishnan , 2011 .
[14] Rémi Munos,et al. A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences , 2011, COLT.
[15] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[16] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[17] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[18] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[19] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[20] Tengyao Wang,et al. Multiple Identications in Multi-Armed Bandits , 2013 .
[21] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.