暂无分享,去创建一个
Marcin Andrychowicz | Tamim Asfour | Pieter Abbeel | Xi Chen | Prafulla Dhariwal | Szymon Sidor | Rein Houthooft | Matthias Plappert | Richard Y. Chen | P. Abbeel | Prafulla Dhariwal | Marcin Andrychowicz | Rein Houthooft | Xi Chen | T. Asfour | Szymon Sidor | Matthias Plappert
[1] G. Uhlenbeck,et al. On the Theory of the Brownian Motion , 1930 .
[2] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[3] H. P. Schwefel,et al. Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie , 1977 .
[4] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[6] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[7] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[8] Ananth Ranganathan,et al. The Levenberg-Marquardt Algorithm , 2004 .
[9] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[10] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[14] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.
[15] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[16] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[17] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[18] Tom Schaul,et al. Stochastic search using the natural gradient , 2009, ICML '09.
[19] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.
[20] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[21] Tom Schaul,et al. A Natural Evolution Strategy for Multi-objective Optimization , 2010, PPSN.
[22] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[23] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[24] Tom Schaul,et al. High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.
[25] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[26] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[27] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[32] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[33] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[34] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[35] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[36] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[37] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[38] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[39] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[40] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[41] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[42] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[43] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[44] Shipra Agrawal,et al. A Near-optimal Regret Bounds for , 2017 .
[45] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.
[46] A. P. Hyper-parameters. Count-Based Exploration with Neural Density Models , 2017 .
[47] S. Shankar Sastry,et al. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.
[48] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[49] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.