SUMMARY A model for the allocation problem in a controlled clinical trial is proposed and is more general than the two-armed bandit. The trial is allowed to involve more than two treatments and experiments to have a large set of possible outcomes. The optimal strategy for the allocation problem is straightforward to compute. there has been since the early 1940's an increasing use of controlled medical trials to compare the effective- ness of different therapeutic or prophylactic treatments. Concern has been expressed about the ethics of such trials by Hill (1963) and others, and latterly increased attention has been given to the design of trials which, while being statistic- ally informative, have desirable 'ethical' properties. One such property might be that as few patients as possible are treated badly during such a trial. The two-armed bandit problem is the simplest problem of real interest in this area of experimental design. The problem is how to carry out a series of experiments using two different treatments, the outcome of each experiment being either success or failure, in such a way as to maximize the number of successes achieved. The probabilities of achieving success are unknown and in general different for the two treatments, which may be used in any order. The records of success and failure which build up as experiments are carried out clearly should influence the choice of treatments. Bellman (1956) gave two Bayesian formulations of the two-armed bandit problem. The first assumed an infinite patient horizon in which rewards or successes in the future were dis- counted. Gittins & Jones (1972) derived the form of the optimal strategy for a multiarmed bandit problem, that is one involving any finite number of treatments, in the infinite horizon case. Their work provides the theoretical foundation for the present paper and is outlined in ?2. Bellman's second formulation assumed a finite patient horizon. As yet no simple form for an optimal strategy for this problem exists. Because of this, many writers have examined the properties of 'sensible' strategies, which have normally been variants of the play-the-winner rule; some examples are given by Robbins (1952) and Fox (1974). A simple extension of the
[1]
R. Bellman.
A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS
,
1954
.
[2]
F. J. Anscombe.
Sequential Medical Trials
,
1963
.
[3]
M. Zelen,et al.
Play the Winner Rule and the Controlled Clinical Trial
,
1969
.
[4]
K. Hinderer,et al.
Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter
,
1970
.
[5]
George H. Weiss,et al.
A two-stage procedure for choosing the better of two binomial populations
,
1972
.
[6]
B. Fox.
Finite Horizon Behavior of Policies for Two-Arm Bandits
,
1974
.
[7]
P. Armitage,et al.
Sequential medical trials. 2nd edition.
,
1975
.
[8]
L. A. Klimko,et al.
Bayesian rules for the two-armed bandit problem
,
1977
.
[9]
J. Klein.
Medical Ethics and Controlled Clinical Trials
,
1979,
The Annals of otology, rhinology & laryngology. Supplement.
[10]
H. Robbins.
Some aspects of the sequential design of experiments
,
1952
.