Gradient-Free Neural Network Training via Synaptic-Level Reinforcement Learning

An ongoing challenge in neural information processing is: how do neurons adjust their connectivity to improve task performance over time (i.e., actualize learning)? It is widely believed that there is a consistent, synaptic-level learning mechanism in specific brain regions, such as the basal ganglia, that actualizes learning. However, the exact nature of this mechanism remains unclear. Here we investigate the use of universal synaptic-level algorithms in training connectionist models. Specifically, we propose an algorithm based on reinforcement learning (RL) to generate and apply a simple biologically-inspired synaptic-level learning policy for multi-layer perceptron (MLP) models. In this algorithm, the action space for each MLP synapse consists of a small increase, decrease, or null action on the synapse weight, and the state for each synapse consists of the last two actions and global reward signals. A binary reward signal indicates either an increase or decrease in the model loss between the previous two iterations. This algorithm yields a static synaptic learning policy that enables simultaneous training of over 20,000 parameters (i.e. synapses) and consistent MLP convergence when applied to simulated decision boundary matching and optical character recognition tasks. The static policy is robust as it produces faster and more consistent training relative to the adaptive policy and is agnostic to activation function, network shape, and task. The trained networks yield character recognition performance comparable to identically shaped networks trained with gradient descent. 0 hidden unit character recognition tests yielded an average validation accuracy of 88.28%, 1.86±0.47% higher than the same MLP trained with gradient descent. 32 hidden unit character recognition tests yielded an average validation accuracy of 88.45%, 1.11±0.79% lower than the same MLP trained with gradient descent. The approach has two significant advantages in comparison to traditional gradient descent-based optimization methods. First, the robustness of our novel method and lack of reliance on gradient computations opens the door to new techniques for training difficult-to-differentiate artificial neural networks such as spiking neural networks (SNNs) and recurrent neural networks (RNNs). Second, the method’s simplicity provides a unique opportunity for further development of local rule-driven multi-agent connectionist models for machine intelligence analogous to cellular automata. Preprint. Under review. ar X iv :2 10 5. 14 38 3v 1 [ cs .N E ] 2 9 M ay 2 02 1

[1]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[2]  Zhipeng Wang,et al.  Reinforcement Learning applied to Single Neuron , 2015, ArXiv.

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Zenghui Wang,et al.  Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review , 2017, Neural Computation.

[5]  Michael Pfeiffer,et al.  Deep Learning With Spiking Neurons: Opportunities and Challenges , 2018, Front. Neurosci..

[6]  Jordan Ott Giving Up Control: Neurons as Reinforcement Learning Agents , 2020, ArXiv.

[7]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[8]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Subutai Ahmad,et al.  Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex , 2015, Front. Neural Circuits.

[11]  Chao Wang,et al.  CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[13]  Gašper Tkačik,et al.  Training and inferring neural network function with multi-agent reinforcement learning , 2020 .

[14]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[15]  Egidio D'Angelo,et al.  Realistic modeling of neurons and networks: towards brain simulation. , 2013, Functional neurology.

[16]  Ralph Adolphs,et al.  The unsolved problems of neuroscience , 2015, Trends in Cognitive Sciences.

[17]  Hiroshi Kajino,et al.  Neuron as an Agent , 2018, ICLR.

[18]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[21]  John H. Conway,et al.  The game of life. , 1996, The Hastings Center report.

[22]  Y. Dan,et al.  Spike timing-dependent plasticity: a Hebbian learning rule. , 2008, Annual review of neuroscience.

[23]  M. Merello,et al.  [Functional anatomy of the basal ganglia]. , 2000, Revista de neurologia.

[24]  Konrad Paul Kording,et al.  Learning to solve the credit assignment problem , 2019, ICLR.

[25]  Timothy Verstynen,et al.  The credit assignment problem in cortico‐basal ganglia‐thalamic networks: A review, a problem and a possible solution , 2020, The European journal of neuroscience.

[26]  Howard Bowman,et al.  Analysing neurobiological models using communicating automata , 2014, Formal Aspects of Computing.

[27]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[28]  Dipo Theophilus Akomolafe,et al.  Comparative study of biological and artificial neural networks , 2013 .

[29]  Joel L. Davis,et al.  Single neuron computation , 1992 .

[30]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[31]  A. Parent,et al.  Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop , 1995, Brain Research Reviews.