Local and Global Optimization Algorithms for Generalized Learning Automata

This paper analyzes the long-term behavior of the REINFORCE and related algorithms (Williams 1986, 1988, 1992) for generalized learning automata (Narendra and Thathachar 1989) for the associative reinforcement learning problem (Barto and Anandan 1985). The learning system considered here is a feedforward connectionist network of generalized learning automata units. We show that REINFORCE is a gradient ascent algorithm but can exhibit unbounded behavior. A modified version of this algorithm, based on constrained optimization techniques, is suggested to overcome this disadvantage. The modified algorithm is shown to exhibit local optimization properties. A global version of the algorithm, based on constant temperature heat bath techniques, is also described and shown to converge to the global maximum. All algorithms are analyzed using weak convergence techniques.

[1]  G. McCormick Second Order Conditions for Constrained Minima , 1967 .

[2]  G. McCormick Second Order Conditions for Constrained Minima , 1967 .

[3]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[4]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and Linear Algebra , 1974 .

[5]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Harold J. Kushner,et al.  Approximation and Weak Convergence Methods for Random Processes , 1984 .

[8]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  F. Aluffi-Pentini,et al.  Global optimization and stochastic differential equations , 1985 .

[10]  S. Geman,et al.  Diffusions for global optimizations , 1986 .

[11]  W. Grassman Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory (Harold J. Kushner) , 1986 .

[12]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[13]  S. Mitter,et al.  Simulated annealing with noisy or imprecise energy measurements , 1989 .

[14]  Mandayam A. L. Thathachar,et al.  Learning the global maximum with parameterized learning automata , 1995, IEEE Trans. Neural Networks.

[15]  Mandayam A. L. Thathachar,et al.  Learning automata in feedforward connectionist systems , 1996, Int. J. Syst. Sci..

[16]  Stuart GEMANf DIFFUSIONS FOR GLOBAL OPTIMIZATION , 2022 .