A distributional code for value in dopamine-based reinforcement learning
暂无分享,去创建一个
Zeb Kurth-Nelson | Clara Kwon Starkweather | Naoshige Uchida | Demis Hassabis | Matthew Botvinick | Will Dabney | Rémi Munos | D. Hassabis | R. Munos | Will Dabney | M. Botvinick | Z. Kurth-Nelson | N. Uchida | C. Starkweather
[1] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[2] Yael Niv,et al. Opening Burton's Clock: Psychiatric insights from computational cognitive models , 2018 .
[3] Tyrone D. Cannon,et al. Striatal dopamine D1 and D2 receptor balance in twins at increased genetic risk for schizophrenia , 2006, Psychiatry Research: Neuroimaging.
[4] Naoshige Uchida,et al. Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.
[5] Rafal Bogacz,et al. Learning Reward Uncertainty in the Basal Ganglia , 2016, PLoS Comput. Biol..
[6] P. Dayan,et al. A computational and neural model of momentary subjective well-being , 2014, Proceedings of the National Academy of Sciences.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] N. Uchida,et al. Neural Circuitry of Reward Prediction Error. , 2017, Annual review of neuroscience.
[9] A. Pouget,et al. Probabilistic brains: knowns and unknowns , 2013, Nature Neuroscience.
[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[11] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.
[12] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[13] Anne E Carpenter,et al. Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.
[14] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.
[15] B. Hoffer,et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus , 2006, Genesis.
[16] Naoshige Uchida,et al. Habenula Lesions Reveal that Multiple Mechanisms Underlie Dopamine Prediction Errors , 2015, Neuron.
[17] Marc G. Bellemare,et al. Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.
[18] N. Uchida,et al. Dopamine neurons share common response function for reward prediction error , 2016, Nature Neuroscience.
[19] S. Lammel,et al. Reward and aversion in a heterogeneous midbrain dopamine system , 2014, Neuropharmacology.
[20] Johanna F. Ziegel,et al. COHERENCE AND ELICITABILITY , 2013, 1303.1690.
[21] Minryung R. Song,et al. Multiphasic Temporal Dynamics in Responses of Midbrain Dopamine Neurons to Appetitive and Aversive Stimuli , 2013, The Journal of Neuroscience.
[22] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[23] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.
[24] W. Schultz,et al. The phasic dopamine signal maturing: from reward via behavioural activation to formal economic utility , 2017, Current Opinion in Neurobiology.
[25] W. Newey,et al. Asymmetric Least Squares Estimation and Testing , 1987 .
[26] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[27] P. Glimcher. Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.
[28] Pedro Rosa-Neto,et al. Gradients of dopamine D1- and D2/3-binding sites in the basal ganglia of pig and monkey measured by PET , 2004, NeuroImage.
[29] M. C. Jones. Expectiles and M-quantiles are quantiles , 1994 .
[30] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[31] William R. Stauffer,et al. Dopamine Reward Prediction Error Responses Reflect Marginal Utility , 2014, Current Biology.
[32] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.
[33] Xiao-Jing Wang,et al. Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.
[34] Joel Z. Leibo,et al. Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.
[35] E. Perry,et al. Dopaminergic activities in the human striatum: rostrocaudal gradients of uptake sites and of D1 and D2 but not of D3 receptor binding or dopamine , 1999, Neuroscience.
[36] Michael J. Frank,et al. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.
[37] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[38] P. Dayan,et al. Depression: a decision-theoretic analysis. , 2015, Annual review of neuroscience.
[39] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.