暂无分享,去创建一个
[1] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[3] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[4] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[5] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[6] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[7] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[8] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[9] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[10] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[11] Traian Rebedea,et al. Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.
[12] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[13] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[14] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[15] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[16] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[17] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.
[18] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[19] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[20] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.
[21] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[22] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.
[23] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[24] Henryk Michalewski,et al. Expert-augmented actor-critic for ViZDoom and Montezumas Revenge , 2018, ArXiv.
[25] Richard Y. Chen,et al. UCB EXPLORATION VIA Q-ENSEMBLES , 2018 .
[26] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..
[27] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..
[28] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.
[29] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[30] Joan Bruna,et al. Backplay: "Man muss immer umkehren" , 2018, ArXiv.
[31] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.
[32] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[33] Tom Schaul,et al. Deep Q-learning From Demonstrations , 2017, AAAI.