Parameter-exploring policy gradients
-
爱吃猫的鱼0于 2021年10月9日 00:37
Frank Sehnke | Christian Osendorfer | Jürgen Schmidhuber | Alex Graves | Jan Peters | Thomas Rückstieß | J. Schmidhuber | Jan Peters | A. Graves | Thomas Rückstieß | Frank Sehnke | Christian Osendorfer | Alex Graves
[1] Jürgen Schmidhuber,et al. State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.
[2] Tom Schaul,et al. Fitness Expectation Maximization , 2008, PPSN.
[3] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[4] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[5] Martin Lauer,et al. Making a Robot Learn to Play Soccer Using Reward and Punishment , 2007, KI.
[6] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[7] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[9] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
[10] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[11] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[12] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[13] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] J. Spall. Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .
[16] James C. Spall,et al. AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .
[17] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[18] Hans-Paul Schwefel,et al. Evolution and optimum seeking , 1995, Sixth-generation computer technology series.
[19] Michael I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .