Adaptive control of constrained finite Markov chains

An adaptive control algorithm is presented for constrained finite controlled Markov chains with unknown transition probabilities. A finite set of algebraic constraints has been considered. The Lagrange multipliers approach is used to solve this constrained optimization problem. This scheme is such that at each time n estimates the control policy on the basis on Bush-Mosteller scheme which is related to stochastic approximation procedures. We present the asymptotic properties (convergence and order of convergence rate) of the algorithm. They follow from the law of dependent large numbers, martingales theory and Lyapunov function analysis approaches.

[1]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[2]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[3]  Eitan Altman,et al.  On the value function in constrained control of Markov chains , 1996, Math. Methods Oper. Res..

[4]  J. Spruce Riordon,et al.  An adaptive automaton controller for discrete-time markov processes , 1969, Autom..

[5]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[6]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[7]  K. Najim,et al.  Adaptive control: theory and practical aspects , 1991 .

[8]  Eitan Altman,et al.  Sensitivity of constrained Markov decision processes , 1991, Ann. Oper. Res..

[9]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[10]  H. Kushner,et al.  Stochastic approximation of constrained systems with system and constraint noise , 1975, at - Automatisierungstechnik.

[11]  R. Agrawal Adaptive Control of Markov Chains under the Weak Accessibility Condition , 1991 .

[12]  M. T. Wasan Stochastic Approximation , 1969 .

[13]  Kaddour Najim,et al.  Learning automata and stochastic optimization , 1997 .

[14]  Kaddour Najim,et al.  Learning automata with continuous input and changing number of actions , 1996, Int. J. Syst. Sci..

[15]  Eugene A. Feinberg,et al.  Constrained Discounted Dynamic Programming , 1996, Math. Oper. Res..

[16]  B. Doshi,et al.  Strong consistency of a modified maximum likelihood estimator for controlled Markov chains , 1980 .

[17]  M. K. Ghosh,et al.  Discrete-time controlled Markov processes with average cost criterion: a survey , 1993 .

[18]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[19]  A. Shwartz,et al.  Adaptive control of constrained Markov chains , 1991 .

[20]  Moshe Haviv,et al.  On constrained Markov decision processes , 1996, Oper. Res. Lett..

[21]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[22]  E. Altman,et al.  Adaptive control of constrained Markov chains: Criteria and policies , 1991 .

[23]  Y. M. El-Fattah Gradient approach for recursive estimation and control in finite Markov chains , 1981, Advances in Applied Probability.

[24]  Alexander S. Poznyak,et al.  Penalty function and adaptive control of constrained finite Markov chains , 1998 .

[25]  Armand M. Makowski,et al.  Implementation Issues for Markov Decision Processes , 1988 .

[26]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[27]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[28]  Armand M. Makowski,et al.  A class of two-dimensional stochastic approximations and steering policies for Markov decision processes , 1992, [1992] Proceedings of the 31st IEEE Conference on Decision and Control.

[29]  T. M. Williams,et al.  Optimizing Methods in Statistics , 1981 .

[30]  Apostolos Burnetas,et al.  Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..

[31]  O. Hernández-Lerma Adaptive Markov Control Processes , 1989 .

[32]  Stephen S. Wilson,et al.  Random iterative models , 1996 .

[33]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[35]  Kaddour Najim,et al.  Learning Automata: Theory and Applications , 1994 .

[36]  Eitan Altman,et al.  Constrained Markov decision processes with total cost criteria: Occupation measures and primal LP , 1996, Math. Methods Oper. Res..

[37]  R. Agrawal,et al.  Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .

[38]  Adam Shwartz,et al.  Optimal priority assignment: a time sharing approach , 1989 .

[39]  J. Bather Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.