论文信息 - Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

Adaptive Policies for Sequential Sampling under Incomplete Information and a Cost Constraint

We consider the problem of sequential sampling from a finite number of independent statistical populations to maximize the expected infinite horizon average outcome per period, under a constraint that the expected average sampling cost does not exceed an upper bound. The outcome distributions are not known. We construct a class of consistent adaptive policies, under which the average outcome converges with probability 1 to the true value under complete information for all distributions with finite means. We also compare the rate of convergence for various policies in this class using simulation.

Apostolos Burnetas | Odysseas Kanavetas | A. Burnetas | Odysseas Kanavetas

[1] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[2] Alexander S. Poznyak,et al. Self-Learning Control of Finite Markov Chains , 2000 .

[3] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[4] You-Gan Wang. Gittins indices and constrained allocation in clinical trials , 1991 .

[5] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[6] Sanjeev R. Kulkarni,et al. Finite-time lower bounds for the two-armed bandit problem , 2000, IEEE Trans. Autom. Control..

[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[8] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[9] Russell Greiner,et al. The Budgeted Multi-armed Bandit Problem , 2004, COLT.

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Hamid Pezeshk,et al. Sample Size Determination in Clinical Trials , 1999 .