论文信息 - Singularly perturbed Markov control problem: Limiting average cost

Singularly perturbed Markov control problem: Limiting average cost

In this paper we consider a singularly perturbed Markov decision process with the limiting average cost criterion. We assume that the underlying process is composed ofn separate irreducible processes, and that the small perturbation is such that it “unites” these processes into a single irreducible process. We formulate the underlying control problem for the singularly perturbed MDP, and call it the “limit Markov control problem” (limit MCP). We prove the validity of the “the limit control principle” which states that an optimal solution to the perturbed MDP can be approximated by an optimal solution of the limit MCP for any sufficiently small perturbation. We also demonstrate that the limit Markov control problem is equivalent to a suitably constructed nonlinear program in the space of long-run state-action frequencies. This approach combines the solutions of the original separated irreducible MDPs with the stationary distribution of a certain “aggregated MDP” and creates a framework for future algorithmic approaches.

Jerzy A. Filar | Tomasz R. Bielecki

[1] P. Schweitzer. Perturbation theory and finite Markov chains , 1968 .

[2] Nico M. van Dijk. PERTURBATION THEORY FOR UNBOUNDED MARKOV REWARD PROCESSES WITH APPLICATIONS TO QUEUEING , 1988 .

[3] M. Puterman,et al. Perturbation theory for Markov reward processes with applications to queueing systems , 1988, Advances in Applied Probability.

[4] R. W. Aldhaheri,et al. Aggregation and optimal control of nearly completely decomposable Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[5] Tosio Kato. Perturbation theory for linear operators , 1966 .

[6] D. Blackwell. Discrete Dynamic Programming , 1962 .

[7] P. Kokotovic,et al. A singular perturbation approach to modeling and control of Markov chains , 1981 .

[8] A. Willsky,et al. Hierarchical aggregation of linear systems with multiple time scales , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[9] F. Delebecque. A Reduction Process for Perturbed Markov Chains , 1983 .

[11] P. Kokotovic. Applications of Singular Perturbation Techniques to Control Problems , 1984 .

[12] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[13] François Delebecque,et al. Optimal control of markov chains admitting strong and weak interactions , 1981, Autom..