Brain Inspired Reinforcement Learning

Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and neurophysiology has demonstrated the role of the basal ganglia and the frontal cortex in mammalian reinforcement learning. This paper develops and explores new reinforcement learning algorithms inspired by neurological evidence that provides potential new approaches to the feature construction problem. The algorithms are compared and evaluated on the Acrobot task.

[1]  Benjamin Van Roy,et al.  Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[2]  A. Barto Adaptive Critics and the Basal Ganglia , 1995 .

[3]  J. Wickens,et al.  Cellular models of reinforcement. , 1995 .

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  W. Schultz,et al.  A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task , 1999, Neuroscience.

[6]  Kenji Doya,et al.  What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  Shun-ichi Amari,et al.  Natural Gradient Learning for Over- and Under-Complete Bases in ICA , 1999, Neural Computation.

[9]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[10]  Roland E. Suri,et al.  Temporal Difference Model Reproduces Anticipatory Neural Activity , 2001, Neural Computation.

[11]  Terrence J. Sejnowski,et al.  Spatiochromatic Receptive Field Properties Derived from Information-Theoretic Analyses of Cone Mosaic Responses to Natural Scenes , 2003, Neural Computation.

[12]  Shimon Whiteson,et al.  Concurrent layered learning , 2003, AAMAS '03.

[13]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[14]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.