暂无分享,去创建一个
[1] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[2] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[3] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[4] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[5] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[6] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.
[7] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.
[8] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[9] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[10] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[11] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[12] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[13] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[14] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[15] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[16] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[17] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[18] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[19] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[20] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.
[21] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.
[22] Peter Auer,et al. Autonomous Exploration For Navigating In MDPs , 2012, COLT.
[23] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[24] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[25] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[26] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[27] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[28] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[29] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[30] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .
[31] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[32] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[33] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[34] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[35] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[36] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[37] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[38] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.
[39] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[40] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .
[41] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[42] Csaba Szepesvári,et al. Model-based reinforcement learning with nearly tight exploration complexity bounds , 2010, ICML.
[43] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[44] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.
[45] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.
[46] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.
[47] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .