Optimizing agent behavior over long time scales by transporting value
暂无分享,去创建一个
Yan Wu | Arun Ahuja | Greg Wayne | Chia-Chun Hung | Mehdi Mirza | Timothy P. Lillicrap | Federico Carnevale | Josh Abramson | T. Lillicrap | Greg Wayne | Arun Ahuja | M. Mirza | Chia-Chun Hung | Josh Abramson | Yan Wu | Federico Carnevale | Mehdi Mirza
[1] H. Blodgett,et al. The effect of the introduction of reward upon the maze performance of rats , 1929 .
[2] P. Samuelson. A Note on Measurement of Utility , 1937 .
[3] E. Tolman. Cognitive maps in rats and men. , 1948, Psychological review.
[4] Allen Newell,et al. The chess machine: an example of dealing with a complex task by adaptation , 1955, AFIPS '55 (Western).
[5] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[6] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.
[7] Elie Bienenstock,et al. Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.
[8] F. Ellis. Intrahousehold Resource Allocation in Developing Countries: Methods, Models, and Policy , 1997 .
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[11] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[12] R. Klein,et al. The dawn of human culture , 2002 .
[13] G. Loewenstein,et al. Time Discounting and Time Preference: A Critical Review , 2002 .
[14] Marcus Hutter. A Gentle Introduction to The Universal Algorithmic Agent AIXI , 2003 .
[15] W. K. Cullen,et al. Dopamine-dependent facilitation of LTP induction in hippocampal CA1 by exposure to spatial novelty , 2003, Nature Neuroscience.
[16] M. McDaniel,et al. (www.interscience.wiley.com) DOI: 10.1002/acp.1002 Delaying Execution of Intentions: Overcoming the Costs of Interruptions , 2022 .
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[19] D. Fudenberg,et al. A Dual Self Model of Impulse Control , 2004, The American economic review.
[20] N. Lemon,et al. Dopamine D1/D5 Receptors Gate the Acquisition of Novel Information through Hippocampal Long-Term Potentiation and Long-Term Depression , 2006, The Journal of Neuroscience.
[21] D. Hassabis,et al. Using Imagination to Understand the Neural Basis of Episodic Memory , 2007, The Journal of Neuroscience.
[22] D. Schacter,et al. Remembering the past to imagine the future: the prospective brain , 2007, Nature Reviews Neuroscience.
[23] Peter Dayan,et al. Hippocampal Contributions to Control: The Third Way , 2007, NIPS.
[24] Russ Tedrake,et al. Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms , 2008, NIPS.
[25] Richard S. Sutton,et al. Sample-based learning and search with permanent and transient memories , 2008, ICML '08.
[26] Aude Oliva,et al. Visual long-term memory has a massive storage capacity for object details , 2008, Proceedings of the National Academy of Sciences.
[27] John R. Anderson,et al. Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes , 2008, Psychological research.
[28] Jan Peters,et al. Episodic Future Thinking Reduces Reward Delay Discounting through an Enhancement of Prefrontal-Mediotemporal Interactions , 2010, Neuron.
[29] Mohamed Chtourou,et al. On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.
[30] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[31] Geoffrey E. Hinton,et al. Training Recurrent Neural Networks , 2013 .
[32] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[33] Michael C. Corballis,et al. The Recursive Mind: The Origins of Human Language, Thought, and Civilization - Updated Edition , 2014 .
[34] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[35] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[36] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[42] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[43] Joel Z. Leibo,et al. Model-Free Episodic Control , 2016, ArXiv.
[44] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[45] John Schulman,et al. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs , 2016 .
[46] Francesco Visin,et al. A guide to convolution arithmetic for deep learning , 2016, ArXiv.
[47] D. Hassabis,et al. Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.
[48] N. Daw,et al. Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework , 2017, Annual review of psychology.
[49] Shane Legg,et al. Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.
[50] J. Pearl,et al. The Book of Why: The New Science of Cause and Effect , 2018 .
[51] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.
[52] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.
[53] Zeb Kurth-Nelson,et al. Been There, Done That: Meta-Learning with Episodic Recall , 2018, ICML.
[54] Mélanie Frappier,et al. The Book of Why: The New Science of Cause and Effect , 2018, Science.
[55] Christopher Joseph Pal,et al. Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.
[56] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[57] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[58] Jane X. Wang,et al. Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.