Decentralized learning in finite Markov chains

The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewards. One decentralized decision maker is associated with each state in which two or more actions (decisions) are available. Each decision maker uses a simple learning scheme, requiring minimal information, to update its action choice. It is shown that, if updating is done in sufficiently small steps, the group will converge to the policy that maximizes the long-term expected reward per step. The analysis is based on learning in sequential stochastic games and on certain properties, derived in this paper, of ergodic Markov chains.

[1]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Bellman A Markovian Decision Process , 1957 .

[3]  D. White Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[4]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[5]  B. Chandrasekaran,et al.  On Expediency and Convergence in Variable-Structure Automata , 1968, IEEE Trans. Syst. Sci. Cybern..

[6]  K. Narendra Competitive and Cooperative Games of Variable-Structure Stochastic Automata , 1973 .

[7]  P. Mandl,et al.  Estimation and control in Markov chains , 1974, Advances in Applied Probability.

[8]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[9]  Norio Baba,et al.  On the Learning Behavior of Stochastic Automata Under a Nonstationary Random Environment , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  S. Lakshmivarahan,et al.  Absolute Expediency of Q-and S-Model Learning Algorithms , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  P. Varaiya,et al.  Multilayer control of large Markov chains , 1978 .

[12]  P. Varaiya Optimal and suboptimal stationary controls for Markov chains , 1978 .

[13]  V. Borkar,et al.  Adaptive control of Markov chains, I: Finite parameter set , 1979 .

[14]  S. Marcus,et al.  Decentralized control of finite state Markov processes , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[15]  Y. M. El-Fattah,et al.  Recursive Algorithms for Adaptive Control of Finite Markov Chains , 1981 .

[16]  S. Lakshmivarahan,et al.  Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information , 1981, Math. Oper. Res..

[17]  K. Narendra,et al.  Learning Algorithms for Two-Person Zero-Sum Stochastic Games with Incomplete Information: A Unified Approach , 1982 .

[18]  P. Kumar,et al.  Optimal adaptive controllers for unknown Markov chains , 1982 .

[19]  Mitsuo Sato,et al.  Learning control of finite Markov chains with unknown transition probabilities , 1982 .

[20]  Kumpati S. Narendra,et al.  Learning Models for Decentralized Decision Making , 1985, 1985 American Control Conference.