Learning automata with changing number of actions
暂无分享,去创建一个
A reinforcement scheme that is based on the linear reward-inaction updating algorithm is presented for a learning automaton whose action set changes from instant to instant. A learning automaton using the algorithm is shown to be both absolutely expedient and ε-optimal. The simulation results verify the ε-optimality of the algorithm. The results can be extended to the design of general nonlinear absolutely expedient learning algorithms.
[1] C. L. Mallows,et al. Individual Choice Behaviour. , 1961 .
[2] M. L. Tsetlin. On the Behavior of Finite Automata in Random Media , 1961 .
[3] S. Lakshmivarahan,et al. Absolutely Expedient Learning Algorithms For Stochastic Automata , 1973 .
[4] S. Lakshmivarahan,et al. Learning Algorithms Theory and Applications , 1981 .
[5] Yann LeCun,et al. Learning on automata networks , 1987 .