论文信息 - Learning automata processing ergodicity of the mean: The two-action case

Learning automata processing ergodicity of the mean: The two-action case

Learning automata which update their action probabilities on the basis of the responses they get from an environment are considered. The automata update the probabilities whether the environment responds with a reward or a penalty. An automation is said to possess ergodicity of the mean (EM) if the mean action probability is the total state probability of an ergodic Markov chain. The only known EM algorithm is the linear reward-penalty (LRP) scheme. For the two-action case, necessary and sufficient conditions have been derived for nonlinear updating schemes to be EM. The method of controlling the rate of convergence of this scheme is presented. In particular, a generalized linear algorithm has been proposed which is superior to the LRP scheme. The expression for the variance of the limiting action probabilities of this scheme is derived.

B. John Oommen | Mandayam A. L. Thathachar

[1] Y. Flerov. Some Classes of Multi-Input Automata , 1972 .

[2] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[3] M. L. Tsetlin. On the Behavior of Finite Automata in Random Media , 1961 .

[4] Y. M. El-Fattah. Gradient approach for recursive estimation and control in finite Markov chains , 1981, Advances in Applied Probability.

[5] M. Norman. Some convergence theorems for stochastic learning models with distance diminishing operators , 1968 .