Optimal Best Arm Identification with Fixed Confidence
暂无分享,去创建一个
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[3] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[4] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[5] Varun Grover,et al. Active Learning in Multi-armed Bandits , 2008, ALT.
[6] Aurélien Garivier,et al. A mdl approach to hmm with Poisson and Gaussian emissions. Application to order identification , 2005 .
[7] Alexandre Proutière,et al. Unimodal Bandits without Smoothness , 2014, ArXiv.
[8] Aurélien Garivier. Consistency of the Unlimited BIC Context Tree Estimator , 2006, IEEE Transactions on Information Theory.
[9] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[10] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[11] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[12] Rajesh Sundaresan,et al. Learning to Detect an Oddball Target , 2015, IEEE Transactions on Information Theory.
[13] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[14] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .
[15] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[16] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[17] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[18] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[19] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[20] Aurélien Garivier,et al. On the Complexity of A/B Testing , 2014, COLT.
[21] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[22] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[23] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[24] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[25] Alexandre Proutière,et al. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.
[26] Rémi Munos,et al. From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..
[27] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[28] Raphail E. Krichevsky,et al. The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.
[29] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[30] Csaba Szepesvári,et al. Structured Best Arm Identification with Fixed Confidence , 2017, ALT.