Learning the global maximum with parameterized learning automata

A feedforward network composed of units of teams of parameterized learning automata is considered as a model of a reinforcement learning system. The internal state vector of each learning automaton is updated using an algorithm consisting of a gradient-following term and a random perturbation term. It is shown that the algorithm weakly converges to a solution of the Langevin equation, implying that the algorithm globally maximizes an appropriate function. The algorithm is decentralized, and the units do not have any information exchange during updating. Simulation results on common payoff games and pattern recognition problems show that reasonable rates of convergence can be obtained.

[1]  M. Piccioni,et al.  Random tunneling by means of acceptance-rejection sampling for global optimization , 1989 .

[2]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  F. Aluffi-Pentini,et al.  Global optimization and stochastic differential equations , 1985 .

[4]  R. L. Anderson,et al.  RECENT ADVANCES IN FINDING BEST OPERATING CONDITIONS , 1953 .

[5]  S. Mitter,et al.  Simulated annealing with noisy or imprecise energy measurements , 1989 .

[6]  M. A. L. Thathachar,et al.  An optimization approach to the analysis of generalized learning automata algorithms , 1990, ACE '90. Proceedings of [XVI Annual Convention and Exhibition of the IEEE In India].

[7]  Robert J. Marks,et al.  Training layered perceptrons using low accuracy computation , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[8]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[9]  Richard Wheeler,et al.  Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[10]  Samuel H. Brooks A Discussion of Random Methods for Seeking Maxima , 1958 .

[11]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[12]  A. V. Levy,et al.  The Tunneling Algorithm for the Global Minimization of Functions , 1985 .

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  P. R. Srikanta Kumar,et al.  Distributed learning of the global maximum in a two-player stochastic game with identical payoffs , 1985, IEEE Transactions on Systems, Man, and Cybernetics.