Multi-Armed bandit problem revisited

In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.