PACE N OISE FOR E XPLORATION
暂无分享,去创建一个
Richard Y. Chen | P. Abbeel | Prafulla Dhariwal | Marcin Andrychowicz | Rein Houthooft | Xi Chen | T. Asfour | Szymon Sidor | Matthias Plappert
[1] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[2] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.
[3] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[4] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[5] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[6] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[7] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[8] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[9] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[10] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[11] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[12] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[13] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[18] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[19] Tom Schaul,et al. High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.
[20] Tom Schaul,et al. A Natural Evolution Strategy for Multi-objective Optimization , 2010, PPSN.
[21] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.
[22] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[23] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[24] Tom Schaul,et al. Stochastic search using the natural gradient , 2009, ICML '09.
[25] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.
[26] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[27] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[28] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[29] Ananth Ranganathan,et al. The Levenberg-Marquardt Algorithm , 2004 .
[30] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[31] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[32] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[33] H. P. Schwefel,et al. Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie , 1977 .
[34] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[35] G. Uhlenbeck,et al. On the Theory of the Brownian Motion , 1930 .