A pattern-matrix learning algorithm for adaptive MDPs : The regularly communicating case (Theory and Application of Decision Analysis in Uncertain Situation)

In this note, as a sequel to our previous $work[\eta$ , we are concerned with adaptive models for uncertain Markov decision processes with regularly communicating structure where the state space is decomposed into a single communicating class and a absolutely transient class. We give a pattern-matrix learning algorithm which finds the regularly communicating structure, by which an asymptotic sequence of adaptive properties $w\cdot ith$ nearly average-optimal properties is constructed. A numerical experiment is given.optimal adaptive policy, regularly communicating case.