A lemma on the multiarmed bandit problem
暂无分享,去创建一个
We prove a lemma on the optimal value function for the multiarmed bandit problem which provides a simple direct proof of optimality of writeoff policies. This, in turn, leads to a new proof of optimality of the index rule.