暂无分享,去创建一个
[1] M. Rothschild. A two-armed bandit theory of market pricing , 1974 .
[2] Brian Gluss. A Note on a Computational Approximation to the Two-Machine Problem , 1962, Inf. Control..
[3] Richard Steck. A dynamic programming strategy for the two machine problem , 1964 .
[4] S. Yakowitz. Mathematics of adaptive control processes , 1969 .
[5] Dorian Feldman. Contributions to the "Two-Armed Bandit" Problem , 1962 .
[6] Bradley P. Carlin,et al. Bayesian Adaptive Methods for Clinical Trials , 2010 .
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..
[9] Jack Bowden,et al. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.
[10] K. Glazebrook. On Randomized Dynamic Allocation Indices for the Sequential Design of Experiments , 1980 .
[11] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[12] Paul Fearnhead,et al. On the Identification and Mitigation of Weaknesses in the Knowledge Gradient Policy for Multi-Armed Bandits , 2016, ArXiv.
[13] J. Gittins,et al. The Learning Component of Dynamic Allocation Indices , 1992 .
[14] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[15] A. Shwartz,et al. The Poisson Equation for Countable Markov Chains: Probabilistic Methods and Interpretations , 2002 .
[16] J. Higgins,et al. Cochrane Handbook for Systematic Reviews of Interventions , 2010, International Coaching Psychology Review.
[17] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[18] Sophie Ahrens,et al. Recommender Systems , 2012 .
[19] John R. Birge,et al. An Approximation Approach for Response Adaptive Clinical Trial Design , 2020 .
[20] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[21] Quentin F. Stout,et al. New adaptive designs for delayed response models , 2006 .
[22] Aurélien Garivier,et al. Learning the distribution with largest mean: two bandit frameworks , 2017, ArXiv.
[23] W. R. Thompson. On the Theory of Apportionment , 1935 .
[24] A. V. den Boer,et al. Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .
[25] A. Kesselheim,et al. Adaptive design clinical trials: a review of the literature and ClinicalTrials.gov , 2018, BMJ Open.
[26] Murray K. Clayton,et al. Small-sample performance of Bernoulli two-armed bandit Bayesian strategies , 1999 .
[27] Marek Petrik,et al. Value Directed Exploration in Multi-Armed Bandits with Structured Priors , 2017, UAI.
[28] J. Matthews,et al. Randomization in Clinical Trials: Theory and Practice; , 2003 .
[29] L. J. Wei,et al. The Randomized Play-the-Winner Rule in Medical Trials , 1978 .
[30] Glen L. Urban,et al. Morphing Theory and Applications , 2017 .
[31] Tomasz Burzykowski,et al. Adaptive Randomization of Neratinib in Early Breast Cancer. , 2016, The New England journal of medicine.
[32] M. Zelen,et al. Play the Winner Rule and the Controlled Clinical Trial , 1969 .
[33] George H. Weiss,et al. A two-stage procedure for choosing the better of two binomial populations , 1972 .
[34] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[35] T. Lai,et al. Optimal learning and experimentation in bandit problems , 2000 .
[36] S. Villar. BANDIT STRATEGIES EVALUATED IN THE CONTEXT OF CLINICAL TRIALS IN RARE LIFE-THREATENING DISEASES , 2017, Probability in the Engineering and Informational Sciences.
[37] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[38] Peter Whittle,et al. Applied Probability in Great Britain , 2002, Oper. Res..
[39] E. Kaufmann. On Bayesian index policies for sequential resource allocation , 2016, 1601.01190.
[40] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.
[41] J. Banks,et al. Denumerable-Armed Bandits , 1992 .
[42] Thomas A. Kelley. A Note on the Bernoulli Two-Armed Bandit Problem , 1974 .
[43] Donald A. Berry,et al. Modified Two-Armed Bandit Strategies for Certain Clinical Trials , 1978 .
[44] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[45] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[46] D. Berry. A Bernoulli Two-armed Bandit , 1972 .
[47] George H. Weiss,et al. Recent results on using the play the winner sampling rule with binomial selection problems , 1972 .
[48] P. Thall,et al. Practical Bayesian adaptive randomisation in clinical trials. , 2007, European journal of cancer.
[49] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[50] F. Kelly. Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .
[51] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[52] Diane Uschner,et al. Randomization: The forgotten component of the randomized clinical trial , 2018, Statistics in medicine.
[53] Michael Hogarth,et al. Adaptive Randomization of Neratinib in Early Breast Cancer. , 2016, The New England journal of medicine.
[54] John R. Birge,et al. Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients , 2016, Eur. J. Oper. Res..
[55] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[56] Warren B. Powell,et al. Optimal Learning , 2022, Encyclopedia of Machine Learning and Data Mining.
[57] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[58] Paul Resnick,et al. Recommender systems , 1997, CACM.
[59] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .
[60] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .
[61] Donald A. Berry,et al. Optimal adaptive randomized designs for clinical trials , 2007 .
[62] Warren B. Powell,et al. A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..
[63] Abdel Hamid. Randomized sequential decision rules : application to the multi-armed bandit problem and the secretary problem , 1981 .
[64] J. Bather. Randomized Allocation of Treatments in Sequential Experiments , 1981 .
[65] Thomas Jaki,et al. A Bayesian adaptive design for clinical trials in rare diseases , 2016, Comput. Stat. Data Anal..
[66] Tze Leung Lai,et al. Incomplete learning from endogenous data in dynamic allocation , 1999 .
[67] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[68] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[69] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[70] D. Berry,et al. Choosing sample size for a clinical trial using decision analysis , 2003 .
[71] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[72] J. Bather. Randomised allocation of treatments in sequential trials , 1980, Advances in Applied Probability.
[73] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .