暂无分享,去创建一个
[1] Daniel Russo,et al. Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.
[2] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.
[3] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[4] D. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .
[5] Annie Liang,et al. Dynamically Aggregating Diverse Information , 2019, EC.
[6] Dean Karlan,et al. Adaptive Experimental Design Using the Propensity Score , 2009 .
[7] Peter W. Glynn,et al. A large deviations perspective on ordinal optimization , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..
[8] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .
[9] Junpei Komiyama,et al. Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling , 2021, 2109.08229.
[10] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[11] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.
[12] Susan Athey,et al. Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits , 2021, KDD.
[13] Michal Valko,et al. Fixed-Confidence Guarantees for Bayesian Best-Arm Identification , 2019, AISTATS.
[14] Keisuke Hirano,et al. Asymptotic analysis of statistical decision rules in econometrics , 2020 .
[15] R. Ellis,et al. LARGE DEVIATIONS FOR A GENERAL-CLASS OF RANDOM VECTORS , 1984 .
[16] Xiequan Fan,et al. Cramér large deviation expansions for martingales under Bernstein’s condition , 2012, 1210.2198.
[17] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[18] M. J. van der Laan,et al. STATISTICAL INFERENCE FOR THE MEAN OUTCOME UNDER A POSSIBLY NON-UNIQUE OPTIMAL TREATMENT STRATEGY. , 2016, Annals of statistics.
[19] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[20] Zhengyuan Zhou,et al. Online Multi-Armed Bandits with Adaptive Inference , 2021, NeurIPS.
[21] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[22] J. Honda,et al. Adaptive Experimental Design for Efficient Treatment Effect Estimation: Randomized Allocation via Contextual Bandit Algorithm , 2020, ArXiv.
[23] Diego Klabjan,et al. Improving the Expected Improvement Algorithm , 2017, NIPS.
[24] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[25] Nando de Freitas,et al. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.
[26] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[27] Stefan Wager,et al. Confidence intervals for policy evaluation in adaptive experiments , 2021, Proceedings of the National Academy of Sciences.
[28] Ion Grama,et al. Large deviations for martingales via Cramér's method , 2000 .
[29] Wouter M. Koolen,et al. Non-Asymptotic Pure Exploration by Solving Games , 2019, NeurIPS.
[30] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[31] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[32] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..
[33] Antoine Chambaz,et al. Post-Contextual-Bandit Inference , 2021, NeurIPS.
[34] Jonathan D. Cryer,et al. Time Series Analysis , 1986 .
[35] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[36] J. Gärtner. On Large Deviations from the Invariant Measure , 1977 .
[37] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[38] Maximilian Kasy,et al. Adaptive Treatment Assignment in Experiments for Policy Choice , 2019, Econometrica.
[39] Lalit Jain,et al. An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits , 2020, NeurIPS.
[40] Aurélien Garivier,et al. On the Complexity of A/B Testing , 2014, COLT.
[41] Nikos Vlassis,et al. More Efficient Off-Policy Evaluation through Regularized Targeted Learning , 2019, ICML.
[42] K. Hirano,et al. Asymptotics for Statistical Treatment Rules , 2009 .
[43] Robert E. Bechhofer,et al. Sequential identification and ranking procedures : with special reference to Koopman-Darmois populations , 1970 .
[44] J Mark,et al. The Construction and Analysis of Adaptive Group Sequential Designs , 2008 .
[45] David Childers,et al. Efficient Online Estimation of Causal Effects by Deciding What to Observe , 2021, NeurIPS.
[46] Alexandra Carpentier,et al. Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem , 2016, COLT.
[47] I. Johnstone,et al. ASYMPTOTICALLY OPTIMAL PROCEDURES FOR SEQUENTIAL ADAPTIVE SELECTION OF THE BEST OF SEVERAL NORMAL MEANS , 1982 .
[48] J. Hahn. On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .
[49] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[50] Shota Yasui,et al. Efficient Counterfactual Learning from Bandit Feedback , 2018, AAAI.
[51] Wonyoung Kim,et al. Doubly Robust Thompson Sampling for linear payoffs , 2021, ArXiv.
[52] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[53] M. J. Laan,et al. Online Targeted Learning , 2014 .
[54] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[55] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[56] Junpei Komiyama,et al. Optimal Simple Regret in Bayesian Best Arm Identification , 2021 .
[57] A. Zeevi,et al. Online Ordinal Optimization under Model Misspecification , 2021 .
[58] Max Tabord-Meehan,et al. Stratification Trees for Adaptive Randomization in Randomized Controlled Trials , 2018, The Review of Economic Studies.