From eye-blinks to state construction: Diagnostic benchmarks for online representation learning
暂无分享,去创建一个
Elliot A. Ludvig | R. Sutton | Adam White | Raksha Kumaraswamy | Sina Ghiassian | Banafsheh Rafiee | Z. Abbas
[1] Xin Li,et al. Training Recurrent Neural Networks Online by Learning Explicit State Variables , 2020, ICLR.
[2] Allan R. Wagner,et al. Expectancies and the Priming of STM , 2018 .
[3] Martha White,et al. Meta-descent for Online, Continual Prediction , 2019, AAAI.
[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[5] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[6] N. Schneiderman. Interstimulus interval function of the nictitating membrane response of the rabbit under delay versus trace conditioning. , 1966 .
[7] C. L. Hull. The problem of stimulus equivalence in behavior theory. , 1939 .
[8] A. Dickinson. Contemporary Animal Learning Theory , 1981 .
[9] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.
[10] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[11] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.
[12] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[13] Shimon Whiteson,et al. Report on the 2008 Reinforcement Learning Competition , 2010, AI Mag..
[14] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[15] Johan S. Obando-Ceron,et al. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research , 2020, ICML.
[16] N. Mackintosh. The psychology of animal learning , 1974 .
[17] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[18] Chrissy M Chubala,et al. Intertrial unconditioned stimuli differentially impact trace conditioning , 2017, Learning & behavior.
[19] Richard S. Sutton,et al. Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System , 2008, Neural Computation.
[20] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[21] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[22] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[23] Richard S. Sutton,et al. Representation Search through Generate and Test , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.
[24] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[25] Howard Eichenbaum,et al. The hippocampus, time, and memory across scales. , 2013, Journal of experimental psychology. General.
[26] Pierre-Yves Oudeyer,et al. How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.
[27] Yoshua Bengio,et al. Conditioning and time representation in long short-term memory networks , 2013, Biological Cybernetics.
[28] W. James. The Principles of Psychology, Vol. I , 2008 .
[29] Richard S. Sutton,et al. A computational model of hippocampal function in trace conditioning , 2008, NIPS.
[30] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[31] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[32] Sergey Levine,et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning , 2018, ICML.
[33] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.
[34] Charles R. Gallistel,et al. Memory and the Computational Brain: Why Cognitive Science will Transform Neuroscience , 2009 .
[35] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[36] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.
[37] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[40] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[41] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[42] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[43] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[44] D. Spalding. The Principles of Psychology , 1873, Nature.
[45] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[46] Joel Z. Leibo,et al. Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.
[47] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[48] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[49] Elliot A. Ludvig,et al. Evaluating the TD model of classical conditioning , 2012, Learning & behavior.
[50] Yann Ollivier,et al. Unbiased Online Recurrent Optimization , 2017, ICLR.
[51] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[52] Justin A. Harris,et al. Negative patterning is easier than a biconditional discrimination. , 2008, Journal of experimental psychology. Animal behavior processes.
[53] André Luzardo,et al. The Rescorla-Wagner Drift-Diffusion model , 2018 .
[54] Larry Rudolph,et al. Implementation Matters in Deep RL: A Case Study on PPO and TRPO , 2020, ICLR.
[55] Joel Z. Leibo,et al. Generalization of Reinforcement Learners with Working and Episodic Memory , 2019, NeurIPS.
[56] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[57] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..