论文信息 - On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes - 字舞流文

On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes

We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.

Ari Arapostathis | Steven I. Marcus | Emmanuel Fernández-Gaucherand | A. Arapostathis | E. Fernández-Gaucherand | S. Marcus

[1] R. Bellman. A Markovian Decision Process , 1957 .

[2] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[3] Robert Bartle,et al. The Elements of Real Analysis , 1977, The Mathematical Gazette.

[4] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[5] H. M. Taylor. Markovian sequential replacement processes , 1965 .

[6] Onésimo Hernández-Lerma,et al. Controlled Markov Processes , 1965 .

[7] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state-information II. The convexity of the lossfunction , 1969 .

[8] S. Ross. Arbitrary State Markovian Decision Processes , 1968 .

[9] T. Yoshikawa,et al. Discrete-Time Markovian Decision Processes with Incomplete State Observation , 1970 .

[10] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[11] S. Ross. Quality Control under Markovian Deterioration , 1971 .

[12] L. G. Gubenko,et al. On discrete time Markov decision processes , 1972 .

[13] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[14] Evan L. Porteus. On the Optimality of Structured Policies in Countable Stage Decision Processes , 1975 .

[15] Robert C. Wang. Computing optimal quality control policies — two actions , 1976 .

[16] Jacob Wijngaard,et al. Stationary Markovian Decision Problems and Perturbation Theory of Quasi-Compact Linear Operators , 1977, Math. Oper. Res..

[17] Chelsea C. White,et al. A Markov Quality Control Process Subject to Partial Observation , 1977 .

[18] Evan L. Porteus,et al. On the Optimality of Structured Policies in Countable Stage Decision Processes. II: Positive and Negative Problems , 1977 .

[19] Robert C. Wang,et al. OPTIMAL REPLACEMENT POLICY WITH UNOBSERVABLE STATES , 1977 .

[20] J. P. Georgin,et al. Estimation et controle des chaines de Markov sur des espaces arbitraires , 1978 .

[21] K. M. vanHee,et al. Bayesian control of Markov chains , 1978 .

[22] C. White. Optimal Inspection and Repair of a Production Process Subject to Deterioration , 1978 .

[23] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[24] C. White. Optimal control-limit strategies for a partially observed replacement problem† , 1979 .

[25] S. Christian Albright,et al. Structural Results for Partially Observable Markov Decision Processes , 1979, Oper. Res..

[26] C. White. Bounds on optimal cost for a replacement problem with partial observations , 1979 .

[27] L. Thomas. Connectedness conditions used in finite state Markov Decision Processes , 1979 .

[28] C. White. Monotone control laws for noisy, countable-state Markov chains , 1980 .

[29] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[30] Daniel P. Heyman,et al. Stochastic models in operations research , 1982 .

[31] Evan L. Porteus. Conditions for characterizing the structure of optimal strategies in infinite-horizon dynamic programs , 1982 .

[32] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[33] H. Mine,et al. An Optimal Inspection and Replacement Policy under Incomplete State Information: Average Cost Criterion , 1984 .

[34] D. J. White,et al. Real Applications of Markov Decision Processes , 1985 .

[35] Ari Arapostathis,et al. Analysis of an identification algorithm arising in the adaptive estimation of Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[36] Hajime Kawai,et al. An optimal inspection and replacement policy under incomplete state information , 1986 .

[37] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[38] Masami Kurano,et al. Markov Decision Processes with a Borel Measurable Cost Function - The Average Case , 1986, Math. Oper. Res..

[39] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[40] William S. Lovejoy. Technical Note - On the Convexity of Policy Regions in Partially Observed Systems , 1987, Oper. Res..

[41] William S. Lovejoy,et al. Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[42] S. Marcus,et al. Adaptive control of Markov processes with incomplete state information and unknown parameters , 1987 .

[43] D. J. White,et al. Further Real Applications of Markov Decision Processes , 1988 .

[44] Shaler Stidham,et al. Scheduling, Routing, and Flow Control in Stochastic Networks , 1988 .

[45] Charles H. Fine. A Quality Control Model with Learning Effects , 1988, Oper. Res..

[46] W. Hopp,et al. Multiaction maintenance under Markovian deterioration and incomplete state information , 1988 .

[47] R. Cavazos-Cadena. Necessary and sufficient conditions for a bounded solution to the optimality equation in average reward Markov decision chains , 1988 .

[48] A. Arapostathis,et al. On the adaptive control of a partially observable Markov decision process , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[49] O. Hernondex-lerma,et al. Adaptive Markov Control Processes , 1989 .

[50] R. Cavazos-Cadena. Necessary conditions for the optimality equation in average-reward Markov decision processes , 1989 .

[51] Linn I. Sennott,et al. Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs , 1989, Oper. Res..

[52] A. Arapostathis,et al. On partially observable Markov decision processes with an average cost criterion , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[53] M. Kurano. The existence of minimum pair of state and policy for Markov decision processes under the hypothesis of Doeblin , 1989 .

[54] V. Borkar. Control of Markov chains with long-run average cost criterion: the dynamic programming equations , 1989 .

[55] O. Hernández-Lerma. Adaptive Markov Control Processes , 1989 .

[56] Steven I. Marcus,et al. Ergodic control of Markov chains , 1990, 29th IEEE Conference on Decision and Control.

[57] O. Hernández-Lerma,et al. Average cost optimal policies for Markov control processes with Borel state space and unbounded costs , 1990 .

[58] A. Arapostathis,et al. Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes , 1991 .

[59] Masami Kurano,et al. Average cost Markov decision processes under the hypothesis of Doeblin , 1991, Ann. Oper. Res..

[60] O. Hernández-Lerma,et al. Recurrence conditions for Markov decision processes with Borel state space: A survey , 1991 .

[61] A. Arapostathis,et al. ON THE ADAPTIVE CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION PROCESS , 2022 .