暂无分享,去创建一个
Tor Lattimore | Michal Valko | Bilal Piot | Alaa Saade | Thomas Mesnard | Zhaohan Daniel Guo | R'emi Munos | Mohammad Gheshlagi Azar | Shantanu Thakoor | Bernardo Avila Pires | R. Munos | Tor Lattimore | Bilal Piot | M. G. Azar | Z. Guo | Michal Valko | Alaa Saade | T. Mesnard | Shantanu Thakoor | B. '. Pires
[1] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[4] David Lopez-Paz,et al. Optimizing the Latent Space of Generative Networks , 2017, ICML.
[5] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[6] Jose Gallego-Posada,et al. GAIT: A Geometric Approach to Information Theory , 2020, AISTATS.
[7] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[8] Christoph Dann,et al. Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning , 2015, NIPS.
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Tor Lattimore,et al. Near-optimal PAC bounds for discounted MDPs , 2014, Theor. Comput. Sci..
[11] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[12] Gabriel Peyré,et al. Geometric Losses for Distributional Learning , 2019, ICML.
[13] Terrence J. Sejnowski,et al. Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.
[14] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .
[15] Peter Auer,et al. Autonomous Exploration For Navigating In MDPs , 2012, COLT.
[16] Marcello Restelli,et al. An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies , 2019, AAAI.
[17] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[18] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[19] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2020, ICML.
[20] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[21] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[22] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[23] Doina Precup,et al. Marginalized State Distribution Entropy Regularization in Policy Optimization , 2019, ArXiv.
[24] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[25] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[26] Sergey Levine,et al. Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.
[27] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..
[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[29] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.
[30] Martin J. Wainwright,et al. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.
[31] Dahua Lin,et al. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination , 2018, ArXiv.
[32] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[33] Sergey Levine,et al. Efficient Exploration via State Marginal Matching , 2019, ArXiv.
[34] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[35] Benjamin Van Roy,et al. The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..
[36] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.
[37] Robert E. Kalaba,et al. Dynamic Programming and Modern Control Theory , 1966 .
[38] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[39] Ronald Ortner,et al. Autonomous exploration for navigating in non-stationary CMPs , 2019, ArXiv.
[40] Ian J. Goodfellow,et al. NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.
[41] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[42] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[43] L. Györfi,et al. Nonparametric entropy estimation. An overview , 1997 .
[44] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[45] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[46] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.