论文信息 - A lemma on the multiarmed bandit problem

A lemma on the multiarmed bandit problem

We prove a lemma on the optimal value function for the multiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.

J. Tsitsiklis