暂无分享,去创建一个
[1] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[2] Alexander S. Poznyak,et al. Self-Learning Control of Finite Markov Chains , 2000 .
[3] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[4] You-Gan Wang. Gittins indices and constrained allocation in clinical trials , 1991 .
[5] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[6] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..
[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[8] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[9] Russell Greiner,et al. The Budgeted Multi-armed Bandit Problem , 2004, COLT.
[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[11] Hamid Pezeshk,et al. Sample Size Determination in Clinical Trials , 1999 .