Adaptive control of discounted Markov decision chains

In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).