Interval timing in deep reinforcement learning agents

The measurement of time is central to intelligent behavior. We know that both animals and artificial agents can successfully use temporal dependencies to select actions. In artificial agents, little work has directly addressed (1) which architectural components are necessary for successful development of this ability, (2) how this timing ability comes to be represented in the units and actions of the agent, and (3) whether the resulting behavior of the system converges on solutions similar to those of biology. Here we studied interval timing abilities in deep reinforcement learning agents trained end-to-end on an interval reproduction paradigm inspired by experimental literature on mechanisms of timing. We characterize the strategies developed by recurrent and feedforward agents, which both succeed at temporal reproduction using distinct mechanisms, some of which bear specific and intriguing similarities to biological systems. These findings advance our understanding of how agents come to represent time, and they highlight the value of experimentally inspired approaches to characterizing agent abilities.

[1]  Anne Verroust-Blondet,et al.  Relational Recurrent Neural Networks For Vehicle Trajectory Prediction , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[2]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[3]  Wei Ji Ma,et al.  A diverse range of factors affect the nature of neural representations underlying short-term memory , 2018, Nature Neuroscience.

[4]  Shraddha Pai,et al.  Minimal Impairment in a Rat Model of Duration Discrimination Following Excitotoxic Lesions of Primary Auditory and Prefrontal Cortices , 2011, Front. Syst. Neurosci..

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dean V. Buonomano,et al.  The Neural Basis of Timing: Distributed Mechanisms for Diverse Functions , 2018, Neuron.

[7]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[8]  Michael N. Shadlen,et al.  A Neural Mechanism for Sensing and Reproducing a Time Interval , 2015, Current Biology.

[9]  Michael N. Shadlen,et al.  Temporal context calibrates interval timing , 2010, Nature Neuroscience.

[10]  Catalin V. Buhusi,et al.  What makes us tick? Functional and neural mechanisms of interval timing , 2005, Nature Reviews Neuroscience.

[11]  R. Church,et al.  Bisection of temporal intervals. , 1977, Journal of experimental psychology. Animal behavior processes.

[12]  Anil K. Seth,et al.  Perceptual Content, Not Physiological Signals, Determines Perceived Duration When Viewing Dynamic, Natural Scenes , 2018, Collabra: Psychology.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Ashesh K Dhawale,et al.  Motor Cortex Is Required for Learning but Not for Executing a Motor Skill , 2015, Neuron.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  J. Gibbon Scalar expectancy theory and Weber's law in animal timing. , 1977 .

[17]  Guy Theraulaz,et al.  A Brief History of Stigmergy , 1999, Artificial Life.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Warrick Roseboom,et al.  Perceptual Content, Not Physiological Signals, Determines Perceived Duration When Viewing Dynamic, Natural Scenes , 2019, Collabra: Psychology.

[20]  P. Killeen,et al.  A behavioral theory of timing. , 1988, Psychological review.

[21]  Joel Z. Leibo,et al.  Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[22]  Xiao-Jing Wang,et al.  Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.

[23]  Elijah A. Petter,et al.  Integrating Models of Interval Timing and Reinforcement Learning , 2018, Trends in Cognitive Sciences.

[24]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[25]  Yan Wu,et al.  Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.

[26]  Shane Legg,et al.  Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.

[27]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[28]  U. Karmarkar,et al.  Timing in the Absence of Clocks: Encoding Time in Neural Network States , 2007, Neuron.

[29]  Luigi Acerbi,et al.  A Framework for Testing Identifiability of Bayesian Models of Perception , 2014, NIPS.

[30]  Richard B. Ivry,et al.  Neural mechanisms of timing , 1997, Trends in Cognitive Sciences.