论文信息 - On finite memory solutions to the two-armed bandit problem (Corresp.)

On finite memory solutions to the two-armed bandit problem (Corresp.)

The least upper bound on the asymptotic proportion of the choice of the correct coin, achievable by {\em expedient} finite-memory algorithms in certain two-armed bandit problems, is derived and schemes which achieve these bounds in a limiting sense are displayed. A deterministic automaton whose performance is close to optimal is also presented.

B. Chandrasekaran | K. B. Lakshmanan | B. Chandrasekaran | K. Lakshmanan

[1] Thomas M. Cover,et al. Optimal Finite Memory Learning Algorithms for the Finite Sample Problem , 1976, Inf. Control..

[2] T. Cover,et al. Learning with Finite Memory , 1970 .

[3] H Robbins,et al. A SEQUENTIAL DECISION PROBLEM WITH A FINITE MEMORY. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[4] B. Chandrasekaran,et al. Finite memory multiple hypothesis testing: Close-to-optimal schemes for Bernoulli problems , 1978, IEEE Trans. Inf. Theory.

[5] I. Witten. The apparent conflict between estimation and control—a survey of the two-armed bandit problem , 1976 .

[6] Thomas M. Cover,et al. The two-armed-bandit problem with time-invariant finite memory , 1970, IEEE Trans. Inf. Theory.

[7] Ian H. Witten. Finite-Time Performance of Some Two-Armed Bandit Controllers , 1973, IEEE Trans. Syst. Man Cybern..

[8] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[9] King-Sun Fu,et al. Formulation of learning automata and automata games , 1969, Inf. Sci..