Simple Bayesian Algorithms for Best Arm Identification
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] R. Bechhofer. A Single-Sample Multiple Decision Procedure for Ranking Means of Normal Populations with known Variances , 1954 .
[3] A. Albert. The Sequential Design of Experiments for Infinitely Many States of Nature , 1961 .
[4] J. Kiefer,et al. Asymptotically Optimum Sequential Inference and Design , 1963 .
[5] D. Freedman. On the Asymptotic Behavior of Bayes' Estimates in the Discrete Case , 1963 .
[6] E. Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .
[7] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .
[8] H. Chernoff. Approaches in Sequential Design of Experiments , 1973 .
[9] D. V. Gokhale,et al. A Survey of Statistical Design and Linear Models. , 1976 .
[10] Y. Rinott. On two-stage selection procedures and related probability-inequalities , 1978 .
[11] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[12] I. Johnstone,et al. ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .
[13] R. Keener. Second Order Efficiency in the Sequential Design of Experiments , 1984 .
[14] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[15] D. Freedman,et al. On the consistency of Bayes estimates , 1986 .
[16] R. Gray. Entropy and Information Theory , 1990, Springer New York.
[17] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.
[18] S. Gupta,et al. Bayesian look ahead one-stage sampling allocations for selection of the best population , 1996 .
[19] L. Wasserman,et al. The consistency of posterior distributions in nonparametric problems , 1999 .
[20] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..
[21] A. V. D. Vaart,et al. Convergence rates of posterior distributions , 2000 .
[22] Stephen E. Chick,et al. New Two-Stage and Sequential Procedures for Selecting the Best Simulated System , 2001, Oper. Res..
[23] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.
[24] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[25] Peter W. Glynn,et al. A large deviations perspective on ordinal optimization , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..
[26] D. Berry. Bayesian Statistics and the Efficiency and Ethics of Clinical Trials , 2004 .
[27] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[28] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[29] Barry L. Nelson,et al. Recent advances in ranking and selection , 2007, 2007 Winter Simulation Conference.
[30] Loo Hay Lee,et al. Efficient Simulation Budget Allocation for Selecting an Optimal Subset , 2008, INFORMS J. Comput..
[31] Warren B. Powell,et al. A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..
[32] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[33] Stephen E. Chick,et al. Economic Analysis of Simulation Selection Problems , 2009, Manag. Sci..
[34] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[35] Jürgen Branke,et al. Sequential Sampling to Myopically Maximize the Expected Value of Information , 2010, INFORMS J. Comput..
[36] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[37] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[38] Peter I. Frazier,et al. Sequential Sampling with Economics of Selection Procedures , 2012, Manag. Sci..
[39] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[40] Tara Javidi,et al. Active Sequential Hypothesis Testing , 2012, ArXiv.
[41] Warren B. Powell,et al. The Knowledge Gradient Algorithm for a General Class of Online Learning Problems , 2012, Oper. Res..
[42] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[43] S. Kakade,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.
[44] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[45] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[46] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[47] Susan R. Hunter,et al. Optimal Sampling Laws for Stochastically Constrained Simulation Optimization on Finite Sets , 2013, INFORMS J. Comput..
[48] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[49] George Atia,et al. Controlled Sensing for Multihypothesis Testing , 2012, IEEE Transactions on Automatic Control.
[50] Liang Tang,et al. Automatic ad format selection via contextual bandits , 2013, CIKM.
[51] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[52] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[53] Loo Hay Lee,et al. Stochastically Constrained Ranking and Selection via SCORE , 2014, ACM Trans. Model. Comput. Simul..
[54] Robert D. Nowak,et al. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).
[55] Peter I. Frazier,et al. A Fully Sequential Elimination Procedure for Indifference-Zone Ranking and Selection with Tight Bounds on Probability of Correct Selection , 2014, Oper. Res..
[56] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[57] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[58] Jack Bowden,et al. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.
[59] P. Glynn,et al. Ordinal optimization - empirical large deviations rate estimators, and stochastic multi-armed bandits , 2015 .
[60] Barry L. Nelson,et al. Discrete Optimization via Simulation , 2015 .
[61] Loo Hay Lee,et al. Ranking and Selection: Efficient Simulation Budget Allocation , 2015 .
[62] James Zou,et al. Controlling Bias in Adaptive Data Analysis Using Information Theory , 2015, AISTATS.
[63] Vivek F. Farias,et al. Optimistic Gittins Indices , 2016, NIPS.
[64] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[65] E. Kaufmann. On Bayesian index policies for sequential resource allocation , 2016, 1601.01190.
[66] Ilya O. Ryzhov,et al. On the Convergence Rates of Expected Improvement Methods , 2016, Oper. Res..
[67] Barry L. Nelson,et al. Indifference-Zone-Free Selection of the Best , 2016, Oper. Res..
[68] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.
[69] Diego Klabjan,et al. Improving the Expected Improvement Algorithm , 2017, NIPS.
[70] Susan R. Hunter,et al. Efficient Ranking and Selection in Parallel Computing Environments , 2015, Oper. Res..
[71] David Simchi-Levi,et al. Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..