On Playing Golf with Two Balls
暂无分享,去创建一个
We analyze and solve a game in which a player chooses which of several Markov chains to advance, with the object of minimizing the expected time (or cost) for one of the chains to reach a target state. The solution entails computing (in polynomial time) a function $\gamma$---a variety of "Gittins index"---on the states of the individual chains, the minimization of which produces an optimal strategy.
It turns out that $\gamma$ is a useful cousin of the expected hitting time of a Markov chain but is defined, for example, even for random walks on infinite graphs. We derive the basic properties of $\gamma$ and consider its values in some natural situations.
[1] J. Tsitsiklis. A short proof of the Gittins index theorem , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[2] Stephen P. Brooks,et al. Markov Decision Processes. , 1995 .
[3] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .
[4] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[5] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .