Reinforcement learning and optimal adaptive control: An overview and implementation examples
暂无分享,去创建一个
Frank L. Lewis | Guido Herrmann | Anthony G. Pipe | Chris Melhuish | Said Ghani Khan | S. G. Khan | F. Lewis | C. Melhuish | A. Pipe | G. Herrmann | S. Khan
[1] P.J. Werbos,et al. Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[2] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[3] Leslie Pack Kaelbling,et al. Reinforcement learning for robot control , 2002, SPIE Optics East.
[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[5] Sebastian Thrun,et al. A Review of Reinforcement Learning , 2000, AI Mag..
[6] Frank L. Lewis,et al. Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[7] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[8] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..
[9] Richard S. Sutton,et al. Reinforcement Learning is Direct Adaptive Optimal Control , 1992, 1991 American Control Conference.
[10] Sanjay Sharma,et al. Application of Soft Computing Techniques to a LQG Controller Design , 2008 .
[11] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[12] Radoslaw Romuald Zakrzewski,et al. Neural network control of nonlinear discrete time systems , 1994 .
[13] Murad Abu-Khalaf,et al. Nonlinear H2/H∞ Constrained Feedback Control: A Practical Design Approach Using Neural Networks , 2007 .
[14] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[15] Philippe Preux,et al. Recent Advances in Reinforcement Learning , 2008, Lecture Notes in Computer Science.
[16] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[17] Valery Kuzmin. CONNECTIONIST Q-LEARNING IN ROBOT CONTROL TASK , 2002 .
[18] Bart De Schutter,et al. Multi-Agent Reinforcement Learning: A Survey , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.
[19] Frank L. Lewis,et al. A Q-learning based Cartesian model reference compliance controller implementation for a humanoid robot arm , 2011, 2011 IEEE 5th International Conference on Robotics, Automation and Mechatronics (RAM).
[20] Luigi Fortuna,et al. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .
[21] Benjamin Van Roy,et al. A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[22] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[23] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[24] Wolfram Burgard,et al. Robotics: Science and Systems XV , 2010 .
[25] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming: An Overview and Recent Results , 2006, OR.
[26] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[27] D. Ernst,et al. Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .
[28] Jennie Si,et al. Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .
[29] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[30] Philippe Preux,et al. Basis Expansion in Natural Actor Critic Methods , 2008, EWRL.
[31] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.
[32] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[33] Matthieu Geist,et al. Tracking in Reinforcement Learning , 2009, ICONIP.
[34] Frank L. Lewis,et al. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach , 2005, Autom..
[35] Frank L. Lewis,et al. A Novel Q-Learning Based Adaptive Optimal Controller Implementation for a Humanoid Robotic Arm , 2011 .
[36] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[37] Lihong Li,et al. Online exploration in least-squares policy iteration , 2009, AAMAS.
[38] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[39] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[40] Kuu-young Young,et al. Reinforcement Learning and Robust Control for Robot Compliance Tasks , 1998, J. Intell. Robotic Syst..
[41] Richard S. Sutton,et al. Connectionist Learning for Control , 1995 .
[42] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[43] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[44] Andrew G. Barto,et al. Connectionist learning for control: an overview , 1990 .
[45] Chris Gaskett,et al. Q-Learning for Robot Control , 2002 .
[46] Kaspar Althoefer,et al. Reinforcement learning in a rule-based navigator for robotic manipulators , 2001, Neurocomputing.
[47] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[48] Sungchul Kang,et al. Learning robot stiffness for contact tasks using the natural actor-critic , 2008, 2008 IEEE International Conference on Robotics and Automation.
[49] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[50] Matthieu Geist,et al. Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences , 2011, IJCAI.
[51] Richard S. Sutton,et al. Reinforcement Learning: Past, Present and Future , 1998, SEAL.
[52] Sungchul Kang,et al. Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[53] E.V. Kampen,et al. Online Adaptive Critic Flight Control using Approximated Plant Dynamics , 2006, 2006 International Conference on Machine Learning and Cybernetics.
[54] Jan Peters,et al. Reinforcement learning for optimal control of arm movements , 2007 .
[55] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[56] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[57] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[58] Frank L. Lewis,et al. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2010, Autom..
[59] Anthony G. Pipe,et al. An Architecture for Learning "Potential Field" Cognitive Maps with an Application to Mobile Robotics , 2000, Adapt. Behav..
[60] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[61] Stefan Schaal,et al. Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.
[62] Guido Herrmann,et al. Safe Adaptive Compliance Control of a Humanoid Robotic Arm with Anti-Windup Compensation and Posture Control , 2010, Int. J. Soc. Robotics.
[63] Mohamed A. Zohdy,et al. Reinforcement learning control of nonlinear multi-link system , 2001 .
[64] D. Bertsekas. Dynamic Programming and Suboptimal Control: From ADP to MPC , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.
[65] Matthieu Geist,et al. Revisiting Natural Actor-Critics with Value Function Approximation , 2010, MDAI.
[66] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.
[67] Frank L. Lewis,et al. Robot Manipulator Control: Theory and Practice , 2003 .
[68] Frank L. Lewis,et al. Adaptive dynamic programming applied to a 6DoF quadrotor , 2011 .
[69] Christian Igel,et al. Reinforcement learning in a nutshell , 2007, ESANN.
[70] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.
[71] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[72] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[73] Stefan Schaal,et al. Variable Impedance Control - A Reinforcement Learning Approach , 2010, Robotics: Science and Systems.
[74] Toshiyuki Kondo,et al. Biological robot arm motion through reinforcement learning , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..
[75] Dimitri P. Bertsekas,et al. Pathologies of temporal difference methods in approximate dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[76] Stefan Schaal,et al. Learning to Control in Operational Space , 2008, Int. J. Robotics Res..
[77] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .
[78] Meng Joo Er,et al. Real-time dynamic fuzzy Q-learning and control of mobile robots , 2004, 2004 5th Asian Control Conference (IEEE Cat. No.04EX904).
[79] Paul J. Werbos,et al. Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.
[80] O. Kulyba,et al. Reinforcement Learning Interfaces for Biomedical Database Systems , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.
[81] Hui Peng,et al. A Survey of Approximate Dynamic Programming , 2009, 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics.
[82] Greg Welch,et al. Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .
[83] H. Kappen. An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .
[84] P. Werbos. 1 ADP : Goals , Opportunities and Principles , 2022 .
[85] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[86] Damien Ernst,et al. Using prior knowledge to accelerate online least-squares policy iteration , 2010, 2010 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).
[87] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[88] Anthony G. Pipe,et al. Towards Safe Human-Robot Interaction , 2011, TAROS.
[89] Martin A. Riedmiller,et al. Neural Reinforcement Learning Controllers for a Real Robot Application , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.
[90] Dimitri P. Bertsekas,et al. Temporal Difference Methods for General Projected Equations , 2011, IEEE Transactions on Automatic Control.
[91] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[92] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[93] Paul J. Werbos,et al. 2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .
[94] Sue Ellen Haupt,et al. Artificial Intelligence Methods in the Environmental Sciences , 2008 .
[95] Bernard Muschielok,et al. The 4MOST instrument concept overview , 2014, Astronomical Telescopes and Instrumentation.
[96] Bart De Schutter,et al. Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.
[97] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[98] Frank L. Lewis,et al. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..
[99] Matthieu Geist,et al. Kalman Temporal Differences , 2010, J. Artif. Intell. Res..
[100] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[101] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[102] Hitesh Shah,et al. Reinforcement learning control of robot manipulators in uncertain environments , 2009, 2009 IEEE International Conference on Industrial Technology.
[103] John K. Williams,et al. Reinforcement Learning of Optimal Controls , 2009 .
[104] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .
[105] Jennie Si,et al. ADP: Goals, Opportunities and Principles , 2004 .
[106] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[107] Guido Herrmann,et al. Adaptive multi-dimensional compliance control of a humanoid robotic arm with anti-windup compensation , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[108] Mohamed A. Zohdy,et al. Application of reinforcement learning control to a nonlinear dexterous robot , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).
[109] Stefan Schaal,et al. Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.
[110] Paul J. Werbos,et al. Approximate dynamic programming for real-time control and neural modeling , 1992 .
[111] Frank L. Lewis,et al. 2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .
[112] F. Lewis,et al. Model-free Q-learning designs for discrete-time zero-sum games with application to H-infinity control , 2007, 2007 European Control Conference (ECC).
[113] B. L. Digney. Nested Q-learning of hierarchical control structures , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).
[114] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[115] Wei Liu,et al. Enhanced Q-learning algorithm for dynamic power management with performance constraint , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[116] J. Hespanha,et al. Forecasting COVID-19 cases based on a parameter-varying stochastic SIR model , 2019, Annual Reviews in Control.