Learning more skills through optimistic exploration
暂无分享,去创建一个
[1] Marc G. Bellemare,et al. Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.
[2] Brendan O'Donoghue,et al. Discovering Diverse Nearly Optimal Policies withSuccessor Features , 2021, ArXiv.
[3] Jinwoo Shin,et al. State Entropy Maximization with Random Encoders for Efficient Exploration , 2021, ICML.
[4] David Warde-Farley,et al. Relative Variational Intrinsic Control , 2020, AAAI.
[5] Longbo Huang,et al. Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework , 2020, AAAI.
[6] Sergey Levine,et al. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL , 2020, NeurIPS.
[7] Jane X. Wang,et al. Temporal Difference Uncertainties as a Signal for Exploration , 2020, ArXiv.
[8] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.
[9] Pieter Abbeel,et al. Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.
[10] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[11] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.
[12] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.
[13] Tim Rocktäschel,et al. RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.
[14] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[15] Akshay Krishnamurthy,et al. Reward-Free Exploration for Reinforcement Learning , 2020, ICML.
[16] David Warde-Farley,et al. Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.
[17] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.
[18] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.
[19] Shimon Whiteson,et al. MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.
[20] Yoshua Bengio,et al. Unsupervised State Representation Learning in Atari , 2019, NeurIPS.
[21] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.
[22] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.
[23] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[24] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.
[25] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.
[26] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[27] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..
[28] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[29] Pieter Abbeel,et al. Variational Option Discovery Algorithms , 2018, ArXiv.
[30] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.
[31] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[32] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[33] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.
[34] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[35] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[36] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[37] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[38] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[39] J. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization , 2012 .
[40] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[41] David Barber,et al. Information Maximization in Noisy Channels : A Variational Approach , 2003, NIPS.
[42] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[43] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[44] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.