On the optimal allocation of two or more treatments in a controlled clinical trial

SUMMARY A model for the allocation problem in a controlled clinical trial is proposed and is more general than the two-armed bandit. The trial is allowed to involve more than two treatments and experiments to have a large set of possible outcomes. The optimal strategy for the allocation problem is straightforward to compute. there has been since the early 1940's an increasing use of controlled medical trials to compare the effective- ness of different therapeutic or prophylactic treatments. Concern has been expressed about the ethics of such trials by Hill (1963) and others, and latterly increased attention has been given to the design of trials which, while being statistic- ally informative, have desirable 'ethical' properties. One such property might be that as few patients as possible are treated badly during such a trial. The two-armed bandit problem is the simplest problem of real interest in this area of experimental design. The problem is how to carry out a series of experiments using two different treatments, the outcome of each experiment being either success or failure, in such a way as to maximize the number of successes achieved. The probabilities of achieving success are unknown and in general different for the two treatments, which may be used in any order. The records of success and failure which build up as experiments are carried out clearly should influence the choice of treatments. Bellman (1956) gave two Bayesian formulations of the two-armed bandit problem. The first assumed an infinite patient horizon in which rewards or successes in the future were dis- counted. Gittins & Jones (1972) derived the form of the optimal strategy for a multiarmed bandit problem, that is one involving any finite number of treatments, in the infinite horizon case. Their work provides the theoretical foundation for the present paper and is outlined in ?2. Bellman's second formulation assumed a finite patient horizon. As yet no simple form for an optimal strategy for this problem exists. Because of this, many writers have examined the properties of 'sensible' strategies, which have normally been variants of the play-the-winner rule; some examples are given by Robbins (1952) and Fox (1974). A simple extension of the