Projected Natural Actor-Critic
暂无分享,去创建一个
Sridhar Mahadevan | Philip S. Thomas | Stephen Giguere | William Dabney | P. Thomas | Will Dabney | S. Mahadevan | S. Giguere
[1] Sridhar Mahadevan,et al. Basis Adaptation for Sparse Nonlinear Reinforcement Learning , 2013, AAAI.
[2] Antonie J. van den Bogert,et al. A Real-Time, 3-D Musculoskeletal Model for Dynamic Simulation of Arm Movements , 2009, IEEE Transactions on Biomedical Engineering.
[3] Andrew G. Barto,et al. Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] Philip Thomas,et al. Bias in Natural Actor-Critic Algorithms , 2014, ICML.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[8] Philip S. Thomas,et al. Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm , 2009, IAAI.
[9] Scott Kuindersma,et al. Variational Bayesian Optimization for Runtime Risk-Sensitive Control , 2012, Robotics: Science and Systems.
[10] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..
[11] Scott Kuindersma,et al. Dexterous mobility with the uBot-5 mobile manipulator , 2009, 2009 International Conference on Advanced Robotics.
[12] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[13] Ari Arapostathis,et al. Control of Markov chains with safety bounds , 2005, IEEE Transactions on Automation Science and Engineering.
[14] Bogert Aj. A Proportional Derivative FES Controller for Planar Arm Movement , 2007 .
[15] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[16] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[17] Bo Liu,et al. Sparse Q-learning with Mirror Descent , 2012, UAI.
[18] Shun-ichi Amari,et al. Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[19] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[20] Patrick M. Pilarski,et al. Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).
[21] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[22] Nuno C. Martins,et al. Control Design for Markov Chains under Safety Constraints: A Convex Approach , 2012, ArXiv.
[23] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[24] Robert F. Kirsch,et al. Combined feedforward and feedback control of a redundant, nonlinear, dynamic musculoskeletal system , 2009, Medical & Biological Engineering & Computing.
[25] Andrew G. Barto,et al. Motor primitive discovery , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[26] Neil Munro,et al. Fast calculation of stabilizing PID controllers , 2003, Autom..
[27] Roderic A. Grupen,et al. Whole-body strategies for mobility and manipulation , 2010 .
[28] C. Lynch,et al. Functional Electrical Stimulation , 2017, IEEE Control Systems.
[29] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[30] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[31] Karl Johan Åström,et al. PID Controllers: Theory, Design, and Tuning , 1995 .