Suboptimal Policies, with Bounds, for Parameter Adaptive Decision Processes

A parameter adaptive decision process is a sequential decision process where some parameter or parameter set impacting the rewards and/or transitions of the process is not known with certainty. Signals from the performance of the system can be processed by the decision maker as time progresses, yielding information regarding which parameter set is operative. Active learning is an essential feature of these processes, and the decision maker must choose actions that simultaneously guide the system in a preferred direction, as well as yield information that can be used to better prescribe future actions. If the operative parameter set is known with certainty, the parameter adaptive problem reduces to a conventional stochastic dynamic program, which is presumed solvable. Previous authors have shown how to use these solutions to generate suboptimal policies with performance bounds for the parameter adaptive problem. Here it is shown that some desirable characteristics of those bounds are shared by a larger cla...

[1]  Awi Federgruen,et al.  Finding Optimal (s, S) Policies Is About As Simple As Evaluating a Single Policy , 1991, Oper. Res..

[2]  K. M. vanHee,et al.  Bayesian control of Markov chains , 1978 .

[3]  Hsien-Te Cheng,et al.  Algorithms for partially observable markov decision processes , 1989 .

[4]  A. F. Veinott Optimal Policy for a Multi-product, Dynamic Non-Stationary Inventory Problem , 1965 .

[5]  R W Jelliffe,et al.  Modeling, adaptive control, and optimal drug therapy. , 1990, Medical progress through technology.

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[8]  Matthew J. Sobel,et al.  Myopic Solutions of Markov Decision Processes and Stochastic Games , 1981, Oper. Res..

[9]  L. Stone Theory of Optimal Search , 1975 .

[10]  D. White Piecewise Linear Approximations for Partially Observable Markov Decision Processes with Finite Horizons , 1992 .

[11]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[12]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state-information II. The convexity of the lossfunction , 1969 .

[13]  M. Schäl On the Optimality of $(s,S)$-Policies in Dynamic Inventory Models with Finite Horizon , 1976 .

[14]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[15]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[16]  Jr. Arthur F. Veinott On the Opimality of $( {s,S} )$ Inventory Policies: New Conditions and a New Proof , 1966 .

[17]  D. Rhenius Incomplete Information in Markovian Decision Models , 1974 .

[18]  W. Lovejoy Myopic policies for some inventory models with uncertain demand distributions , 1990 .

[19]  William S. Lovejoy An approximate algorithm, with bounds, for composite-state partially observed Markov decision processes , 1990, 29th IEEE Conference on Decision and Control.

[20]  H. Scarf THE OPTIMALITY OF (S,S) POLICIES IN THE DYNAMIC INVENTORY PROBLEM , 1959 .

[21]  B. Eaves A Course in Triangulations for Solving Equations with Deformations , 1984 .

[22]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[23]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[24]  R. Bellman Dynamic programming. , 1957, Science.

[25]  M. Degroot Optimal Statistical Decisions , 1970 .

[26]  Katy S. Azoury Bayes Solution to Dynamic Inventory Models Under Unknown Demand Distribution , 1985 .

[27]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[28]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[29]  Awi Federgruen,et al.  An Efficient Algorithm for Computing Optimal (s, S) Policies , 1984, Oper. Res..

[30]  C. White Application of two inequality results for concave functions to a stochastic optimization problem , 1976 .