Bayesian dynamic programming

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.