On using distribution theory to prove the epsilon-optimality of stubborn learning mechanisms

The authors consider the problem of a learning mechanism learning the optimal action offered by a random environment. The mechanism presented can be defined as an action probability updating rule and thus a variable-structure stochastic automaton. The machine is essentially a stubborn machine; in other words, once the machine has chosen a particular action it increases the probability of choosing the action irrespective of whether the response from the environment was favorable or unfavorable. However, this increase in the action probability is done in a systematic and methodical way so that the machine learns, in an epsilon -optimal fashion, the best action which the environment offers. The proposed mechanism forms an excellent model for an epsilon -optimal stubbornly learning system. Apart from the fact that the machine is shown to be epsilon -optimal, a major contribution of the present work is that the mathematical tools used in this proof (namely the theory of distributions, kernels, and topological spaces) are quite distinct from those which are currently used in the field of learning. Also presented are simulation results which demonstrate the properties of the mechanism and which compare it to the traditional L/sub RI/ scheme.<<ETX>>

[1]  B. John Oommen Ergodic Learning Automata Capable of Incorporating a Priori Information , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Ewart A. C. Thomas On a class of additive learning models: Error-correcting and probability matching , 1973 .

[3]  Frederick Mosteller,et al.  Stochastic Models for Learning , 1956 .

[4]  King-Sun Fu,et al.  Learning Control Systems-Review and Outlook , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  B. Johnoommen Absorbing and Ergodic Discretized Two-Action Learning Automata , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  B. John Oommen,et al.  Multiaction learning automata possessing ergodicity of the mean , 1985, Inf. Sci..

[7]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[8]  S. Lakshmivarahan,et al.  Learning Algorithms Theory and Applications , 1981 .

[9]  E. Lovejoy,et al.  Analysis of the overlearning reversal effect. , 1966, Psychological review.

[10]  P. Suppes,et al.  Contemporary Developments in Mathematical Psychology , 1976 .

[11]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[12]  F. Trèves Topological vector spaces, distributions and kernels , 1967 .

[13]  Hirotomo Aso,et al.  Absolute expediency of learning automata , 1979, Inf. Sci..

[14]  J. Campione The performance of preschool children on reversal and two types of extradimensional shifts , 1971 .

[15]  B. John Oommen,et al.  Epsilon-optimal stubborn learning mechanisms , 1990, IEEE Trans. Syst. Man Cybern..