2008 Special Issue: Reinforcement learning of motor skills with policy gradients
暂无分享,去创建一个
[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[2] Jun Morimoto,et al. Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.
[3] T. Moon,et al. Mathematical Methods and Algorithms for Signal Processing , 1999 .
[4] Douglas Aberdeen,et al. POMDPs and Policy Gradients , 2006 .
[5] Lennart Råde,et al. Springers Mathematische Formeln , 1996 .
[6] Shin Ishii,et al. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[7] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.
[8] Oliver G. Selfridge,et al. Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[9] Shin Ishii,et al. Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.
[10] Stefan Schaal,et al. Rapid synchronization and accurate phase-locking of rhythmic motor primitives , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[11] A. Berny,et al. Statistical machine learning and combinatorial optimization , 2001 .
[12] James C. Spall,et al. Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .
[13] Thomas Hofmann,et al. Natural Actor-Critic for Road Traffic Optimisation , 2007 .
[14] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[15] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[16] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[17] Noah J. Cowan,et al. Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.
[18] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[19] Machine Learning of Motor Skills for Robotics, Jan Peters , 2022 .
[20] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[21] Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.
[22] Shin Ishii,et al. Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.
[23] Stefan Schaal,et al. A Kendama learning robot based on a dynamic optimization theory , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.
[24] J. Spall,et al. Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .
[25] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[26] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[27] Bruno Siciliano,et al. Modeling and Control of Robot Manipulators , 1995 .
[28] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[29] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[30] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[31] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[32] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[33] Tamar Flash,et al. Motor primitives in vertebrates and invertebrates , 2005, Current Opinion in Neurobiology.
[34] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[35] S. Ishii,et al. Off-Policy Natural Actor-Critic , 2005 .
[36] Andrew Zisserman,et al. Advances in Neural Information Processing Systems (NIPS) , 2007 .
[37] M. Kawato,et al. Trajectory formation of arm movement by a neural network with forward and inverse dynamics models , 1993 .
[38] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[39] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[40] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[41] J. Spall,et al. Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).
[42] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[43] W Wim Meiden,et al. Book review: Springers mathematische Formeln. Taschenbuch für Ingenieure, Naturwissenschaftler, Wirtschaftswissenschaftler , 1998 .
[44] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[45] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .
[46] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..
[47] M. Ciletti,et al. The computation and theory of optimal control , 1972 .
[48] J. Spall. STOCHASTIC OPTIMIZATION , 2002 .
[49] D. Signorini,et al. Neural networks , 1995, The Lancet.
[50] Jun Morimoto,et al. Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.
[51] Vijay Balasubramanian,et al. Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.
[52] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[53] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[54] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[55] L. Hasdorff. Gradient Optimization and Nonlinear Control , 1976 .
[56] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[57] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[58] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.
[59] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.
[60] R. Fletcher. Practical Methods of Optimization , 1988 .
[61] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.
[62] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[63] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .
[64] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[65] Jun Nakanishi,et al. Learning Movement Primitives , 2005, ISRR.
[66] Takayuki Kanda,et al. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[67] Aude Billard,et al. Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.
[68] Michael C. Fu,et al. Feature Article: Optimization for simulation: Theory vs. Practice , 2002, INFORMS J. Comput..
[69] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[70] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[71] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[72] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[73] Amir Karniel,et al. Minimum Acceleration Criterion with Constraints Implies Bang-Bang Control as an Underlying Principle for Optimal Trajectories of Arm Reaching Movements , 2008, Neural Computation.
[74] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..