Adaptive control of a partially observed controlled Markov chain

We consider an adaptive finite state controlled Markov chain with partial state information, motivated by a class of replacement problems. We present parameter estimation techniques based on the information available after actions that reset the state to a known value are taken. We prove that the parameter estimates converge w.p.1 to the true (unknown) parameter, under the feedback structure induced by a certainty equivalent adaptive policy. We also show that the adaptive policy is self-optimizing, in a long-run average sense, for any (measurable) sequence of parameter estimates converging w.p.1 to the true parameter.

[1]  A. Arapostathis,et al.  ON THE ADAPTIVE CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION PROCESS , 2022 .

[2]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[3]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[4]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[5]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[6]  A. Arapostathis,et al.  On the adaptive control of a partially observable Markov decision process , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[7]  O. Hernondex-lerma,et al.  Adaptive Markov Control Processes , 1989 .

[8]  Ari Arapostathis,et al.  On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[9]  Chelsea C. White,et al.  A Markov Quality Control Process Subject to Partial Observation , 1977 .

[10]  Ari Arapostathis,et al.  Analysis of an identification algorithm arising in the adaptive estimation of Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[11]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[12]  A. Arapostathis,et al.  Analysis of an adaptive control scheme for a partially observed controlled Markov chain , 1990, 29th IEEE Conference on Decision and Control.

[13]  Armand M. Makowski,et al.  Comparing Policies in Markov Decision Processes: Mandl's Lemma Revisited , 1990, Math. Oper. Res..

[14]  H. Mine,et al.  An Optimal Inspection and Replacement Policy under Incomplete State Information: Average Cost Criterion , 1984 .

[15]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[16]  Ari Arapostathis,et al.  Analysis of an adaptive control scheme for a partially observed controlled Markov chain , 1990 .

[17]  Subhash Kak,et al.  Advances in Computing and Control , 1989 .

[18]  Shunji Osaki,et al.  Stochastic Models in Reliability Theory , 1984 .

[19]  V. Nollau Kushner, H. J./Clark, D. S., Stochastic Approximation Methods for Constrained and Unconstrained Systems. (Applied Mathematical Sciences 26). Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. X, 261 S., 4 Abb., DM 26,40. US $ 13.20 , 1980 .

[20]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[21]  Harold J. Kushner AN AVERAGING METHOD FOR STOCHASTIC APPROXIMATIONS WITH DISCONTINUOUS DYNAMICS, CONSTRAINTS, AND STATE DEPENDENT NOISE , 1983 .