W 2-4150 Aggregation-Disaggregation Algorithm for E 2-Singularly Perturbed Limiting Average Markov Control Problems

Finite state and action Markov Decision Processes (MDPs, for short) are dynamic, stochastic, systems controlled by a controller, sometimes referred to as “decision-maker” . These models have been extensively studied since the 1950’s by applied probabilists, operations researchers, and engineers. Engineers typically refer to these models as “Markov control problems”, and in this paper we shall use these labels interchangibly. The early MDP modela were studied by Howard [13] and Blackwell [5] and, following the latter, are sometimes referred to as “Discrete Dynamic Programming”. During the 1960’s and 1970’s the theory of classical MDP’s evolved to the extent that there is now a complete existence theory, and a number of good algorithms for computing optimal policies, with respect to criteria such as maximization of limiting average expected output, or the discounted expected output. These models were applied in a variety of contexts, ranging from waterresource models, through communication networks, to inventory and maintenance models. One class of problems that began to be addressed in recent years focussed around the following question: