Convergence Properties of a Computational Learning Model for Unknown Markov Chains

The increasing complexity of engineering systems has motivated continuing research on computational learning methods toward making autonomous intelligent systems that can learn how to improve their performance over time while interacting with their environment. These systems need not only to sense their environment, but also to integrate information from the environment into all decision-makings. The evolution of such systems is modeled as an unknown controlled Markov chain. In a previous research, the predictive optimal decision-making (POD) model was developed, aiming to learn in real time the unknown transition probabilities and associated costs over a varying finite time horizon. In this paper, the convergence of the POD to the stationary distribution of a Markov chain is proven, thus establishing the POD as a robust model for making autonomous intelligent systems. This paper provides the conditions that the POD can be valid, and be an interpretation of its underlying structure.

[1]  Panos Y. Papalambros,et al.  A State-Space Representation Model and Learning Algorithm for Real-Time Decision-Making Under Uncertainty , 2007 .

[2]  Pravin Varaiya ADAPTIVE CONTROL OF MARKOV CHAINS: A SURVEY , 1982 .

[3]  V. Borkar,et al.  Adaptive control of Markov chains, I: Finite parameter set , 1979 .

[4]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[5]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[6]  V. Borkar,et al.  Identification and adaptive control of Markov chains , 1982 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[9]  B. Doshi,et al.  Strong consistency of a modified maximum likelihood estimator for controlled Markov chains , 1980 .

[10]  MITSUO SATO,et al.  Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[11]  P. Kumar,et al.  A new family of optimal adaptive controllers for Markov chains , 1982 .

[12]  S. Lakshmivarahan,et al.  Probability and Random Processes , 2007 .

[13]  Andreas A. Malikopoulos Real-time, self-learning identification and stochastic optimal control of advanced powertrain systems. , 2008 .

[14]  G. Grimmett,et al.  Probability and random processes , 2002 .

[15]  R. Agrawal,et al.  Certainty equivalence control with forcing: revisited , 1990 .

[16]  John A. Gubner Probability and Random Processes for Electrical and Computer Engineers , 2006 .

[17]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[18]  P. R. Kumar,et al.  Adaptive Control with a Compact Parameter Set , 1982 .

[19]  V. Borkar A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL , 2000, Probability in the Engineering and Informational Sciences.

[20]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[21]  Mitsuo Sato,et al.  An asymptotically optimal learning controller for finite Markov chains with unknown transition probabilities , 1985 .

[22]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[23]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[24]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[25]  Panos Y. Papalambros,et al.  Optimal engine calibration for individual driving styles , 2008 .

[26]  Panos Y. Papalambros,et al.  Real-Time Self-Learning Optimization of Diesel Engine , 2007 .

[27]  Panos Y. Papalambros,et al.  A Learning Algorithm for Optimal Internal Combustion Engine Calibration in Real Time , 2007, DAC 2007.