A Local Temporal Difference Code for Distributional Reinforcement Learning
暂无分享,去创建一个
[1] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[2] Ilana B. Witten,et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.
[3] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[4] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[5] Adam Kepecs,et al. A computational framework for the study of confidence in humans and animals , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.
[6] Marc G. Bellemare,et al. Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.
[7] Timothy E. J. Behrens,et al. Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.
[8] Ethan S. Bromberg-Martin,et al. Multiple Timescales of Memory in Lateral Habenula and Dopamine Neurons , 2010, Neuron.
[9] I. J. Day. On the inversion of diffusion NMR data: Tikhonov regularization and optimal choice of the regularization parameter. , 2011, Journal of magnetic resonance.
[10] N. Uchida,et al. Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.
[11] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[12] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[13] William R. Stauffer,et al. Dopamine Reward Prediction Error Responses Reflect Marginal Utility , 2014, Current Biology.
[14] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[15] Zeb Kurth-Nelson,et al. A distributional code for value in dopamine-based reinforcement learning , 2020, Nature.
[16] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.
[17] Ryan Webb,et al. Adaptive neural coding: from biological to behavioral decision-making , 2015, Current Opinion in Behavioral Sciences.
[18] Joseph W. Barter,et al. Beyond reward prediction errors: the role of dopamine in movement kinematics , 2015, Front. Integr. Neurosci..
[19] Peter Dayan,et al. Uncertainty in learning, choice, and visual fixation , 2019, Proceedings of the National Academy of Sciences.
[20] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.
[21] Saori C. Tanaka,et al. Serotonin Differentially Regulates Short- and Long-Term Prediction of Rewards in the Ventral and Dorsal Striatum , 2007, PloS one.
[22] A. Cooper,et al. Predictive Reward Signal of Dopamine Neurons , 2011 .
[23] Peter Dayan,et al. Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .
[24] Marc W. Howard,et al. Scale Invariant Value Computation for Reinforcement Learning in Continuous Time , 2017, AAAI Spring Symposia.
[25] Timothy E. J. Behrens,et al. Learning the value of information in an uncertain world , 2007, Nature Neuroscience.
[26] Division of Labor for Division: Inhibitory Interneurons with Different Spatial Landscapes in the Olfactory System , 2013, Neuron.
[27] Marc W. Howard,et al. A Scale-Invariant Internal Representation of Time , 2012, Neural Computation.
[28] Adrienne L. Fairhall,et al. Intrinsic Gain Modulation and Adaptive Neural Coding , 2008, PLoS Comput. Biol..
[29] Saori C. Tanaka,et al. Serotonin Affects Association of Aversive Outcomes to Past Actions , 2009, The Journal of Neuroscience.
[30] David J. Foster,et al. Reverse replay of behavioural sequences in hippocampal place cells during the awake state , 2006, Nature.
[31] Andrew E. Yagle. Regularized Matrix Computations , 2005 .
[32] Marc W. Howard,et al. Predicting the Future with Multi-scale Successor Representations , 2018, bioRxiv.
[33] Doina Precup,et al. Knowledge Representation for Reinforcement Learning using General Value Functions , 2018 .
[34] Ilana B. Witten,et al. Specialized coding of sensory, motor, and cognitive variables in VTA dopamine neurons , 2019, Nature.
[35] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[36] Daeyeol Lee,et al. Heterogeneous Coding of Temporally Discounted Values in the Dorsal and Ventral Striatum during Intertemporal Choice , 2011, Neuron.
[37] Eero P. Simoncelli,et al. Natural image statistics and neural representation. , 2001, Annual review of neuroscience.
[38] Saori C. Tanaka,et al. Serotonin and the Evaluation of Future Rewards , 2007, Annals of the New York Academy of Sciences.
[39] A. Graybiel,et al. Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.