Exploring Unknown States with Action Balance
暂无分享,去创建一个
[1] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[2] Martin Müller,et al. On Principled Entropy Exploration in Policy Optimization , 2019, IJCAI.
[3] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[4] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[5] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[6] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[7] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[8] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[9] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[10] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[11] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[12] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[13] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[14] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.
[15] Salima Hassas,et al. A survey on intrinsic motivation in reinforcement learning , 2019, ArXiv.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[18] Marc Pollefeys,et al. Episodic Curiosity through Reachability , 2018, ICLR.
[19] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[20] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.
[21] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[22] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[23] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[24] Marlos C. Machado,et al. Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment , 2019, ArXiv.