Analysis of an adaptive control scheme for a partially observed controlled Markov chain

The authors consider an adaptive finite state controlled Markov chain with partial state information, motivated by a class of replacement problems. They present parameter estimation techniques based on the information available after actions that reset the state to a known value are taken. It is proved that the parameter estimates converge w.p.1 to the true (unknown) parameter, under the feedback structure induced by a certainty equivalent adaptive policy. It is shown that the adaptive policy is self-optimizing in a long-run average sense, for any (measurable) sequence of parameter estimates converging w.p.1 to the true parameter. >

[1]  A. Arapostathis,et al.  ON THE ADAPTIVE CONTROL OF A PARTIALLY OBSERVABLE BINARY MARKOV DECISION PROCESS , 2022 .

[2]  Armand M. Makowski,et al.  Comparing Policies in Markov Decision Processes: Mandl's Lemma Revisited , 1990, Math. Oper. Res..

[3]  A. Arapostathis,et al.  On the adaptive control of a partially observable Markov decision process , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[4]  O. Hernondex-lerma,et al.  Adaptive Markov Control Processes , 1989 .

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[6]  Chelsea C. White,et al.  A Markov Quality Control Process Subject to Partial Observation , 1977 .

[7]  Ari Arapostathis,et al.  Analysis of an identification algorithm arising in the adaptive estimation of Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[8]  H. Mine,et al.  An Optimal Inspection and Replacement Policy under Incomplete State Information: Average Cost Criterion , 1984 .

[9]  Ari Arapostathis,et al.  On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes , 1991, Ann. Oper. Res..

[10]  A. Arapostathis,et al.  Analysis of an adaptive control scheme for a partially observed controlled Markov chain , 1990, 29th IEEE Conference on Decision and Control.

[11]  A. Arapostathis,et al.  On partially observable Markov decision processes with an average cost criterion , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[12]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[13]  J. AN AVERAGING METHOD FOR STOCHASTIC APPROXIMATIONS WITH DISCONTINUOUS DYNAMICS , CONSTRAINTS , AND STATE DEPENDENT NOISE by , 2022 .

[14]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[15]  Pravin Varaiya,et al.  Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[16]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.