Reinforcement learning with internal-dynamics-based exploration using a chaotic neural network

In this paper, a novel concept is proposed where exploration, which is essential in reinforcement learning, is considered to be one aspect of the motions generated by the learner's internal dynamics and is expected to develop through learning towards more purposeful higher dynamic functions such as “thinking”. To realize such a concept, a chaotic neural network is introduced for generating motions with exploratory factors that are derived from the internal chaotic dynamics without adding external random noises. Effective exploration is expected based on the dynamics called “chaotic itinerancy”, which is also expected to be the key to learning higher dynamic functions more easily that require both stable and transitive dynamics. This paper also proposes a reinforcement learning method without any external random noise, using the temporal difference (TD) error of the state value and the contribution trace of each input to the output increase in each neuron. It was confirmed in a simple learning task that by using a chaotic neural network, an agent could explore in accordance with the internal chaotic dynamics and could learn goal-directed behaviors. The proposed framework seems promising to explain the emergence of higher intelligence in real lives and also to develop human-like intelligence though there are many remaining problems to be solved.

[1]  Katsunari Shibata,et al.  Emergence of Intelligence through Reinforcement Learning with a Neural Network , 2011 .

[2]  K. Aihara,et al.  Chaotic Neural Networks(Bifurcation Phenomena in Nonlinear Systems and Theory of Dynamical Systems) , 1989 .

[3]  Shigeki Sugano,et al.  CREATING NOVEL GOAL-DIRECTED ACTIONS AT CRITICALITY: A NEURO-ROBOTIC EXPERIMENT , 2009 .

[4]  M. K. Ali,et al.  Nonlinear dynamics and chaos in information processing neural networks , 2001 .

[5]  Katsunari Shibata,et al.  Acquisition of deterministic exploration and purposive memory through reinforcement learning with a recurrent neural network , 2010, Proceedings of SICE Annual Conference 2010.

[6]  J. Rogers Chaos , 1876 .

[7]  Nobuyuki Matsui,et al.  Effects of chaotic exploration on reinforcement learning in target capturing task , 2008, Int. J. Knowl. Based Intell. Eng. Syst..

[8]  W. Freeman,et al.  How brains make chaos in order to make sense of the world , 1987, Behavioral and Brain Sciences.

[9]  K. Aihara,et al.  Chaotic neural networks , 1990 .

[10]  Masafumi Hagiwara,et al.  Successive Learning in Hetero-Associative Memory Using Chaotic Neural Networks , 1999, Int. J. Neural Syst..

[11]  K. Shibata,et al.  Learning of Deterministic Exploration and Temporal Abstraction in Reinforcement Learning , 2006, 2006 SICE-ICASE International Joint Conference.

[12]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[13]  Jun Tani,et al.  A Neurodynamic Account of Spontaneous Behaviour , 2011, PLoS Comput. Biol..

[14]  Steven H. Strogatz,et al.  Nonlinear Dynamics and Chaos , 2024 .