Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for training policies, including artificial neural network policies, that is both backpropagation-free and clock-free. It is backpropagation-free in that it does not propagate any information backwards through the network. It is clock-free in that no signal is given to each node in the network to specify when it should compute its output and when it should update its weights. We contend that these two properties increase the biological plausibility of our algorithms and facilitate distributed implementations. Additionally, our approach eliminates the need for customized learning rules for hierarchical RL algorithms like the option-critic.

[1]  D. Bertsekas Gradient convergence in gradient methods , 1997 .

[2]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[3]  Richard L. Lewis,et al.  Optimal Rewards for Cooperative Agents , 2014, IEEE Transactions on Autonomous Mental Development.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[7]  John S. Edwards,et al.  The Hedonistic Neuron: A Theory of Memory, Learning and Intelligence , 1983 .

[8]  Andrew G. Barto,et al.  Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[9]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[10]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[11]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[13]  Philip S. Thomas,et al.  Policy Gradient Coagent Networks , 2011, NIPS.

[14]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Andrew G. Barto,et al.  Conjugate Markov Decision Processes , 2011, ICML.

[17]  Paul J. Werbos,et al.  Regular Cycles of Forward and Backward Signal Propagation in Prefrontal Cortex and in Consciousness , 2016, Front. Syst. Neurosci..

[18]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..