Free energy based policy gradients
暂无分享,去创建一个
Evangelos Theodorou | Emanuel Todorov | Jiri Najemnik | Evangelos A. Theodorou | E. Theodorou | E. Todorov | J. Najemnik
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] H. Kushner,et al. A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems , 1991 .
[3] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[4] W. Fleming,et al. Controlled Markov processes and viscosity solutions , 1992 .
[5] M. James. Controlled markov processes and viscosity solutions , 1994 .
[6] Wolfgang J. Runggaldier,et al. Connections between stochastic control and dynamic games , 1996, Math. Control. Signals Syst..
[7] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[8] Alex M. Andrew,et al. ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Geoffrey E. Hinton,et al. Using EM for Reinforcement Learning , 2000 .
[11] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.
[12] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[13] Sanjoy K. Mitter,et al. A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..
[14] Jun Morimoto,et al. Robust Reinforcement Learning , 2005, Neural Computation.
[15] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[16] Rémi Munos,et al. Sensitivity Analysis Using It[o-circumflex]--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control , 2005, SIAM J. Control. Optim..
[17] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
[18] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[19] Charalambos D. Charalambous,et al. Stochastic Uncertain Systems Subject to Relative Entropy Constraints: Induced Norms and Monotonicity Properties of Minimax Games , 2007, IEEE Transactions on Automatic Control.
[20] Floyd B. Hanson,et al. Applied stochastic processes and control for jump-diffusions - modeling, analysis, and computation , 2007, Advances in design and control.
[21] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..
[22] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[23] H. Touchette. The large deviation approach to statistical mechanics , 2008, 0804.0327.
[24] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[25] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.
[26] Jan Peters,et al. Policy Search for Motor Primitives , 2009, Künstliche Intell..
[27] Emanuel Todorov,et al. Policy gradients in linearly-solvable MDPs , 2010, NIPS.
[28] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..
[29] Stefan Schaal,et al. Variable Impedance Control - A Reinforcement Learning Approach , 2010, Robotics: Science and Systems.
[30] Stefan Schaal,et al. Learning variable impedance control , 2011, Int. J. Robotics Res..
[31] Evangelos A. Theodorou,et al. Iterative path integral stochastic optimal control: Theory and applications to motor control , 2011 .
[32] Machine Learning of Motor Skills for Robotics, Jan Peters , 2022 .