Adaptive Policies in Markov Decision Processes with Uncertain Transition Matrices

This study is concerned with Markov Decision Processes with uncertain transition matrices. In the discounted case, the Bayesian analysis of this model is studied.We define an adaptive policy and a learning policy and show that there exists, for any ???> 0 an ???-optimal and learning policy. In the average case, the non-Bayesian analysis of this model is studied and an optimal adaptive policy is constructed.