How to Stay Curious while avoiding Noisy TVs using Aleatoric Uncertainty Estimation

When extrinsic rewards are sparse, artificial agents struggle to explore an environment. Curiosity, implemented as an intrinsic reward for prediction errors, can improve exploration but it is known to fail when faced with action-dependent noise sources (‘noisy TVs’). In an attempt to make exploring agents robust to noisy TVs, we present a simple solution: aleatoric mapping agents (AMAs). AMAs are a novel form of curiosity that explicitly ascertain which state transitions of the environment are unpredictable, even if those dynamics are induced by the actions of the agent. This is achieved by generating separate forward predictions for the mean and aleatoric uncertainty of future states, with the aim of reducing intrinsic rewards for those transitions that are unpredictable. We demonstrate that in a range of environments AMAs are able to circumvent action-dependent stochastic traps that immobilise conventional curiosity driven agents. Furthermore, we demonstrate empirically that other common exploration approaches—previously thought to be immune to agent-induced randomness—can be trapped by stochastic dynamics. Code to repro-duce our experiments is provided.

[1]  Joost R. van Amersfoort,et al.  On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty , 2021, 2102.11409.

[2]  Yoshua Bengio,et al.  DEUP: Direct Epistemic Uncertainty Prediction , 2021, ArXiv.

[3]  Willem Waegeman,et al.  Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods , 2019, Machine Learning.

[4]  Philip H. S. Torr,et al.  Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty , 2021, ArXiv.

[5]  Algorithms For Reinforcement Learning Synthesis Lectures On Artificial Intelligence And Machine Learning Epdf Read , 2021 .

[6]  Tim Rocktäschel,et al.  RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.

[7]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[8]  Christos Dimitrakakis,et al.  Epistemic Risk-Sensitive Reinforcement Learning , 2019, ESANN.

[9]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[10]  Ji-Hoon Kim,et al.  Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty , 2019, ICML.

[11]  William R. Clements,et al.  Estimating Risk and Uncertainty in Deep Reinforcement Learning , 2019, ArXiv.

[12]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[13]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[14]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[15]  John Schulman,et al.  Gotta Learn Fast: A New Benchmark for Generalization in RL , 2018, ArXiv.

[16]  Finale Doshi-Velez,et al.  Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[17]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[18]  Karl J. Friston,et al.  Uncertainty, epistemics and active inference , 2017, Journal of The Royal Society Interface.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[22]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[23]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[24]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[25]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[26]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[28]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[29]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[32]  James G. Heys,et al.  Possible role of acetylcholine in regulating spatial novelty effects on theta rhythm and grid cells , 2012, Front. Neural Circuits.

[33]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[34]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[35]  M. Sarter,et al.  Article Prefrontal Acetylcholine Release Controls Cue Detection on Multiple Timescales , 2022 .

[36]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[37]  M. Hasselmo The role of acetylcholine in learning and memory , 2006, Current Opinion in Neurobiology.

[38]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  M. Giovannini,et al.  Changes in acetylcholine extracellular levels during cognitive processes. , 2004, Learning & memory.

[41]  G. Rainer,et al.  Cognitive neuroscience: Neural mechanisms for detecting and remembering novel events , 2003, Nature Reviews Neuroscience.

[42]  Peter Dayan,et al.  Expected and Unexpected Uncertainty: ACh and NE in the Neocortex , 2002, NIPS.

[43]  L. Bianchi,et al.  Effects of novelty and habituation on acetylcholine, GABA, and glutamate release from the frontal cortex and hippocampus of freely moving rats , 2001, Neuroscience.

[44]  Stephen C. Hora,et al.  Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management , 1996 .

[45]  H. Fibiger,et al.  Conditioned and Unconditioned Stimuli Increase Frontal Cortical and Hippocampal Acetylcholine Release: Effects of Novelty, Habituation, and Fear , 1996, The Journal of Neuroscience.

[46]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[47]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[48]  Murray S. Davis,et al.  That's Interesting! , 1971 .