Escaping Stochastic Traps with Aleatoric Mapping Agents

Exploration in environments with sparse rewards is difficult for artificial agents. Curiosity driven learning — using feed-forward prediction errors as intrinsic rewards — has achieved some success in these scenarios, but fails when faced with action-dependent noise sources. We present aleatoric mapping agents (AMAs), a neuroscience inspired solution modeled on the cholinergic system of the mammalian brain. AMAs aim to explicitly ascertain which dynamics of the environment are unpredictable, regardless of whether those dynamics are induced by the actions of the agent. This is achieved by generating separate forward predictions for the mean and variance of future states and reducing intrinsic rewards for those transitions with high aleatoric variance. We show AMAs are able to effectively circumvent action-dependent stochastic traps that immobilise conventional curiosity driven agents. The code for all experiments presented in this paper is open-sourced.

[1]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[4]  James G. Heys,et al.  Possible role of acetylcholine in regulating spatial novelty effects on theta rhythm and grid cells , 2012, Front. Neural Circuits.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[7]  L. Bianchi,et al.  Effects of novelty and habituation on acetylcholine, GABA, and glutamate release from the frontal cortex and hippocampus of freely moving rats , 2001, Neuroscience.

[8]  Karl J. Friston,et al.  Uncertainty, epistemics and active inference , 2017, Journal of The Royal Society Interface.

[9]  H. Fibiger,et al.  Conditioned and Unconditioned Stimuli Increase Frontal Cortical and Hippocampal Acetylcholine Release: Effects of Novelty, Habituation, and Fear , 1996, The Journal of Neuroscience.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[12]  M. Sarter,et al.  Article Prefrontal Acetylcholine Release Controls Cue Detection on Multiple Timescales , 2022 .

[13]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[14]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[15]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[16]  M. Giovannini,et al.  Changes in acetylcholine extracellular levels during cognitive processes. , 2004, Learning & memory.

[17]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[18]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[19]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[20]  Peter Dayan,et al.  ACh, Uncertainty, and Cortical Inference , 2001, NIPS.

[21]  Finale Doshi-Velez,et al.  Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[22]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[25]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[26]  Willem Waegeman,et al.  Aleatoric and Epistemic Uncertainty in Machine Learning: A Tutorial Introduction , 2019, ArXiv.

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Peter Dayan,et al.  Expected and Unexpected Uncertainty: ACh and NE in the Neocortex , 2002, NIPS.

[30]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[31]  Murray S. Davis,et al.  That's Interesting! , 1971 .

[32]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[33]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[34]  Christos Dimitrakakis,et al.  Epistemic Risk-Sensitive Reinforcement Learning , 2019, ESANN.

[35]  C. Thiel,et al.  Hippocampal acetylcholine and habituation learning , 1998, Neuroscience.

[36]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[37]  William R. Clements,et al.  Estimating Risk and Uncertainty in Deep Reinforcement Learning , 2019, ArXiv.

[38]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[39]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[40]  G. Rainer,et al.  Cognitive neuroscience: Neural mechanisms for detecting and remembering novel events , 2003, Nature Reviews Neuroscience.

[41]  Willem Waegeman,et al.  Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods , 2019, Machine Learning.

[42]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[43]  M. Hasselmo The role of acetylcholine in learning and memory , 2006, Current Opinion in Neurobiology.