暂无分享,去创建一个
Razvan Pascanu | Nicolas Heess | Alexandre Galashov | Francesco Visin | Pablo Sprechmann | Sebastian Flennerhag | Andre Barreto | Steven Kapturowski | Diana L. Borsa | Jane X. Wang
[1] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[2] Pushmeet Kohli,et al. Strong Generalization and Efficiency in Neural Programs , 2020, ArXiv.
[3] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[4] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[5] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[6] Catholijn M. Jonker,et al. Efficient exploration with Double Uncertain Value Networks , 2017, ArXiv.
[7] Ian Osband,et al. The Uncertainty Bellman Equation and Exploration , 2017, ICML.
[8] Dale Schuurmans,et al. Improving Policy Gradient by Exploring Under-appreciated Rewards , 2016, ICLR.
[9] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[11] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[12] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.
[13] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[14] Mikhail Belkin,et al. Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning , 2020, ArXiv.
[15] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[16] Qiang Liu,et al. Learning to Explore via Meta-Policy Gradient , 2018, ICML.
[17] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[18] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[19] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[20] H. Sebastian Seung,et al. QXplore: Q-learning Exploration by Maximizing Temporal Difference Error , 2019, ArXiv.
[21] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[22] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[23] Junhyuk Oh,et al. A Self-Tuning Actor-Critic Algorithm , 2020, NeurIPS.
[24] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[25] Tor Lattimore,et al. Behaviour Suite for Reinforcement Learning , 2019, ICLR.
[26] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[27] Adam M White,et al. DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .
[28] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[29] Xiang Ren,et al. Temporal Attribute Prediction via Joint Modeling of Multi-Relational Structure Evolution , 2020, IJCAI.
[30] Qiang Liu,et al. Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.
[31] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[32] Pieter Abbeel,et al. Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.
[33] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[34] Junhyuk Oh,et al. Self-Tuning Deep Reinforcement Learning , 2020, ArXiv.
[35] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[36] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.
[37] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[38] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[39] Ehud Ahissar,et al. Reinforcement active learning hierarchical loops , 2011, The 2011 International Joint Conference on Neural Networks.
[40] Jing Peng,et al. Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .
[41] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[42] Tom Schaul,et al. Adapting Behaviour for Learning Progress , 2019, ArXiv.
[43] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[44] Honglak Lee,et al. Contingency-Aware Exploration in Reinforcement Learning , 2018, ICLR.
[45] Doina Precup,et al. Smart exploration in reinforcement learning using absolute temporal difference errors , 2013, AAMAS.
[46] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[47] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[48] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] Sebastian Tschiatschek,et al. Successor Uncertainties: exploration and uncertainty in temporal difference learning , 2018, NeurIPS.
[51] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.
[52] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[53] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[54] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.