A nonlinear reinforcement scheme for stochastic learning automata

A stochastic automaton can perform a finite number of actions in a random environment. When a specific action is performed, the environment responds by producing an environment output that is stochastically related to the action. This response may be favorable or unfavorable. The aim is to design an automaton that can determine the best action guided by past actions and responses. The reinforcement scheme presented is shown to satisfy all necessary and sufficient conditions for absolute expediency for a stationary environment. An automaton using this scheme is guaranteed to "do better" at every time step than at the previous step (expected value of the average penalty at one iteration step is less than of the previous step for all steps). Some simulation results are presented, which prove that our algorithm converges to a solution faster than the one given in [7].

[1]  Olivier Buffet,et al.  Incremental reinforcement learning for designing multi-agent systems , 2001, AGENTS '01.

[2]  S. Lakshmivarahan,et al.  Absolutely Expedient Learning Algorithms For Stochastic Automata , 1973 .

[3]  Pushkin Kachroo,et al.  Simulation study of multiple intelligent vehicle control using stochastic learning automata , 1997 .

[4]  Carlos Rivero,et al.  Characterization of the absolutely expedient learning algorithms for stochastic automata in a non-discrete space of actions , 2003, ESANN.

[5]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Dana Simian,et al.  Automatic control based on wasp behavioral model and stochastic learning automata , 2008 .

[8]  John S. Bay,et al.  Intelligent navigation of autonomous vehicles in an automated highway system: learning methods and interacting vehicles approach , 1997 .

[9]  P. Kachroo,et al.  Simulation study of learning automata games in automated highway systems , 1997, Proceedings of Conference on Intelligent Transportation Systems.

[10]  Iulian Pah,et al.  A New Reinforcement Scheme for Stochastic Learning Automata - Application to Automatic Control , 2008, ICE-B.

[11]  Yufeng Liu,et al.  Stochastic Direct Reinforcement: Application to Simple Games with Recurrence , 2004, AAAI Technical Report.

[12]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Hitoshi Iba,et al.  Reinforcement Learning Estimation of Distribution Algorithm , 2003, GECCO.

[15]  M. Dorigo Introduction to the Special Issue on Learning Autonomous Robots , 1996 .

[16]  Pushkin Kachroo,et al.  Multiple stochastic learning automata for vehicle path control in an automated highway system , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[17]  N. Baba New Topics in Learning Automata Theory and Applications , 1985 .