Combinatorial Design of a Stochastic Markov Decision Process

We consider a problem in which we seek to optimally design a Markov decision process (MDP). That is, subject to resource constraints we first design the action sets that will be available in each state when we later optimally control the process. The control policy is subject to additional constraints governing state-action pair frequencies, and we allow randomized policies. When the design decision is made, we are uncertain of some of the parameters governing the MDP, but we assume a distribution for these stochastic parameters is known. We focus on transient MDPs with a finite number of states and actions. We formulate, analyze and solve a two-stage stochastic integer program that yields an optimal design. A simple example threads its way through the paper to illustrate the development. The paper concludes with a larger application involving optimal design of malaria intervention strategies in Nigeria.

[1]  A. S. Manne Linear Programming and Sequential Decisions , 1960 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  John S. Edwards,et al.  Linear Programming and Finite Markovian Control Problems , 1983 .

[4]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[5]  Daniel Adelman,et al.  Dynamic Bid Prices in Revenue Management , 2007, Oper. Res..

[6]  Peter Kall,et al.  Stochastic Programming , 1995 .

[7]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[8]  R. Wets,et al.  L-SHAPED LINEAR PROGRAMS WITH APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC PROGRAMMING. , 1969 .

[9]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[10]  Timothy P Robinson,et al.  Malaria prevention in highland Kenya: indoor residual house‐spraying vs. insecticide‐treated bednets , 2002, Tropical medicine & international health : TM & IH.

[11]  Awash Teklehaimanot,et al.  Estimated global resources needed to attain international malaria control goals. , 2007, Bulletin of the World Health Organization.

[12]  L. A. Zadeh,et al.  Optimal Pursuit Strategies in Discrete-State Probabilistic Systems , 1962 .

[13]  Keith W. Ross,et al.  Randomized and Past-Dependent Policies for Markov Decision Processes with Multiple Constraints , 1989, Oper. Res..

[14]  P. Comba,et al.  Part I. Theory , 2007 .

[15]  Jacques F. Benders,et al.  Partitioning procedures for solving mixed-variables programming problems , 2005, Comput. Manag. Sci..

[16]  G. Killeen,et al.  Short report: entomologic inoculation rates and Plasmodium falciparum malaria prevalence in Africa. , 1999, The American journal of tropical medicine and hygiene.

[17]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[18]  F. Beutler,et al.  Optimal policies for controlled markov chains with a constraint , 1985 .

[19]  B. Nahlen,et al.  Effectiveness of intermittent preventive treatment with sulphadoxine‐pyrimethamine for control of malaria in pregnancy in western Kenya: a hospital‐based study , 2004, Tropical medicine & international health : TM & IH.

[20]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[21]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[22]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[23]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[24]  Lodewijk C. M. Kallenberg,et al.  Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory , 1994, Math. Methods Oper. Res..

[25]  C. Derman On Sequential Decisions and Markov Chains , 1962 .

[26]  Mary Mungai,et al.  Comparison of intermittent preventive treatment with chemoprophylaxis for the prevention of malaria during pregnancy in Mali. , 2005, The Journal of infectious diseases.

[27]  R. Bellman Dynamic programming. , 1957, Science.

[28]  Harvey M. Wagner On the Optimality of Pure Strategies , 1960 .

[29]  Eugene A. Feinberg,et al.  Constrained Markov Decision Models with Weighted Discounted Rewards , 1995, Math. Oper. Res..

[30]  S. Sarkar,et al.  Malaria in Africa: Vector Species' Niche Models and Relative Risk Maps , 2007, PloS one.