暂无分享,去创建一个
Junhyuk Oh | Satinder Singh | David Silver | Zeyu Zheng | Matteo Hessel | Zhongwen Xu | Hado van Hasselt | Manuel Kroiss | Junhyuk Oh | M. Kroiss | D. Silver | Satinder Singh | Matteo Hessel | H. V. Hasselt | Zeyu Zheng | Zhongwen Xu | David Silver
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[3] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[4] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[5] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[6] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[7] Jieyu Zhao,et al. Simple Principles of Metalearning , 1996 .
[8] Sebastian Thrun,et al. Learning to Learn: Introduction and Overview , 1998, Learning to Learn.
[9] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[10] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[13] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[14] Angela J. Yu,et al. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration , 2007, Philosophical Transactions of the Royal Society B: Biological Sciences.
[15] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[16] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[17] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[18] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.
[19] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[20] Richard L. Lewis,et al. Reward Design via Online Gradient Ascent , 2010, NIPS.
[21] Ehud Ahissar,et al. Reinforcement active learning hierarchical loops , 2011, The 2011 International Joint Conference on Neural Networks.
[22] Marco Mirolli,et al. Functions and Mechanisms of Intrinsic Motivations , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.
[23] Jonathan D. Cohen,et al. Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.
[24] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.
[25] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[26] Honglak Lee,et al. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.
[27] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[28] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[29] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[30] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[31] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[32] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[33] David Silver,et al. Meta-Gradient Reinforcement Learning , 2018, NeurIPS.
[34] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[35] Qiang Liu,et al. Learning to Explore via Meta-Policy Gradient , 2018, ICML.
[36] Qiang Liu,et al. Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.
[37] Pieter Abbeel,et al. The Importance of Sampling inMeta-Reinforcement Learning , 2018, NeurIPS.
[38] Martha White,et al. Discovery of Predictive Representations With a Network of General Value Functions , 2018 .
[39] Pushmeet Kohli,et al. Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.
[40] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[41] Anca D. Dragan,et al. Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.
[42] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.
[43] Sergey Levine,et al. InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.
[44] Jascha Sohl-Dickstein,et al. Meta-Learning Update Rules for Unsupervised Representation Learning , 2018, ICLR.
[45] Martha White,et al. Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study , 2019, ArXiv.
[46] T. Griffiths,et al. Reconciling novelty and complexity through a rational analysis of curiosity. , 2019, Psychological review.
[47] Louis Kirsch,et al. Improving Generalization in Meta Reinforcement Learning using Neural Objectives , 2020, ICLR 2020.
[48] Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2019, ICLR.
[49] Yevgen Chebotar,et al. Meta Learning via Learned Loss , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).