On computing the expected discounted return in a markov chain
暂无分享,去创建一个
The discounted return associated with a finite state Markov chain X1, X2… is given by g(X1)+ αg(X2) + α2g(X3) + …, where g(x) represents the immediate return from state x. Knowing the transition matrix of the chain, it is desired to compute the expected discounted return (present worth) given the initial state. This type of problem arises in inventory theory, dynamic programming, and elsewhere.
Usually the solution is approximated by solving the system of linear equations characterizing the expected return. These equations can be solved by a variety of well-known methods. This paper describes yet another method, which is a slight modification of the classical iterative scheme. The method gives sequences of upper and lower bounds which converge mono-tonely to the solution. Hence, the method is relatively free of error control problems.
Computational experiments were conducted which suggest that for problems with a large number of states, the method is quite efficient. The amount of computation required to obtain the solution increases much slower with an increase in the number of states, N, than with the conventional methods. In fact, computational time is more nearly proportional to N2, than to N3.