论文信息 - Convergence Properties of a Computational Learning Model for Unknown Markov Chains

Convergence Properties of a Computational Learning Model for Unknown Markov Chains

The increasing complexity of engineering systems has motivated continuing research on computational learning methods toward making autonomous intelligent systems that can learn how to improve their performance over time while interacting with their environment. These systems need not only to sense their environment, but also to integrate information from the environment into all decision-makings. The evolution of such systems is modeled as an unknown controlled Markov chain. In a previous research, the predictive optimal decision-making (POD) model was developed, aiming to learn in real time the unknown transition probabilities and associated costs over a varying finite time horizon. In this paper, the convergence of the POD to the stationary distribution of a Markov chain is proven, thus establishing the POD as a robust model for making autonomous intelligent systems. This paper provides the conditions that the POD can be valid, and be an interpretation of its underlying structure.

Andreas A. Malikopoulos

[1] Panos Y. Papalambros,et al. A State-Space Representation Model and Learning Algorithm for Real-Time Decision-Making Under Uncertainty , 2007 .

[2] Pravin Varaiya. ADAPTIVE CONTROL OF MARKOV CHAINS: A SURVEY , 1982 .

[3] V. Borkar,et al. Adaptive control of Markov chains, I: Finite parameter set , 1979 .

[4] P. Mandl,et al. Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[5] John G. Kemeny,et al. Finite Markov chains , 1960 .

[6] V. Borkar,et al. Identification and adaptive control of Markov chains , 1982 .

[7] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[9] B. Doshi,et al. Strong consistency of a modified maximum likelihood estimator for controlled Markov chains , 1980 .

[10] MITSUO SATO,et al. Learning control of finite Markov chains with an explicit trade-off between estimation and control , 1988, IEEE Trans. Syst. Man Cybern..

[11] P. Kumar,et al. A new family of optimal adaptive controllers for Markov chains , 1982 .