Switching control in multi-mode Markov decision processes

This paper presents a switching control strategy for multi-mode Markov decision processes. The system to be controlled is modeled as a finite-state controlled Markov chain with a mode that evolves stochastically. Although the system state is observable, the mode is only partially observable in the sense that we know the system mode only when it is in a given set of observable modes. Given a set of controllers for the system, we consider the problem of determining a switching rule that selects the controller to be applied each time the system mode is observable. The objective is to minimize a long-term average cost from the system while satisfying bounds on the long-term average of other given performance measures. We assume the multi-mode model parameters are unknown a priori, so an adaptive switching rule is required. Algorithms are presented for computing approximations to the optimal switching rule based on estimating the model parameters online. The approach is illustrated for an example of dynamic power management of hard disk drives in computer systems.

[1]  Luca Benini,et al.  Policy optimization for dynamic power management , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Luca Benini,et al.  Dynamic power management for nonstationary service requests , 1999, Design, Automation and Test in Europe Conference and Exhibition, 1999. Proceedings (Cat. No. PR00078).

[4]  Zhiyuan Ren,et al.  A time aggregation approach to Markov decision processes , 2002, Autom..

[5]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[6]  Zhiyuan Ren,et al.  Adaptive control of Markov chains with average cost , 2001, IEEE Trans. Autom. Control..