Optimal Critic Learning for Robot Control in

In this paper, optimal critic learning is developed for robot control in a time-varying environment. The unknown environment is described as a linear system with time-varying parameters, and impedance control is employed for the interac- tion control. Desired impedance parameters are obtained in the sense of an optimal realization of the composite of trajectory tracking and force regulation. Q-function-based critic learning is developed to determine the optimal impedance parameters without the knowledge of the system dynamics. The simulation results are presented and compared with existing methods, and the efficacy of the proposed method is verified.

[1]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[2]  George G. Lendaris,et al.  Higher Level Application of ADP: A Next Phase for the Control Field? , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Keyvan Hashtrudi-Zaad,et al.  Optimization-based Robot Compliance Control: Geometric and Linear Quadratic Approaches , 2005, Int. J. Robotics Res..

[4]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[5]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[6]  F. Miyazaki,et al.  Bettering operation of dynamic systems by learning: A new control theory for servomechanism or mechatronics systems , 1984, The 23rd IEEE Conference on Decision and Control.

[7]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[8]  Tao Zhang,et al.  Adaptive control of first-order systems with nonlinear parameterization , 2000, IEEE Trans. Autom. Control..

[9]  Shuzhi Sam Ge Adaptive controller design for flexible joint manipulators , 1996, Autom..

[10]  Shuzhi Sam Ge,et al.  Robust adaptive control of a class of nonlinear strict-feedback discrete-time systems with exact output tracking , 2009, Autom..

[11]  Shuzhi Sam Ge,et al.  Learning impedance control for physical robot–environment interaction , 2012, Int. J. Control.

[12]  K. Xiong,et al.  Adaptive robust extended Kalman filter for nonlinear stochastic systems , 2008 .

[13]  Prabhakar R. Pagilla,et al.  Adaptive estimation of time-varying parameters in linear systems , 2003, Proceedings of the 2003 American Control Conference, 2003..

[14]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[15]  J. Slotine,et al.  On the Adaptive Control of Robot Manipulators , 1987 .

[16]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[17]  Shuzhi Sam Ge,et al.  Impedance Learning for Robots Interacting With Unknown Environments , 2014, IEEE Transactions on Control Systems Technology.

[18]  Alin Albu-Schäffer,et al.  Interaction Force, Impedance and Trajectory Adaptation: By Humans, for Robots , 2010, ISER.

[19]  Shuzhi Sam Ge,et al.  Adaptive Neural Network Control of Robotic Manipulators , 1999, World Scientific Series in Robotics and Intelligent Systems.

[20]  Shuzhi Sam Ge,et al.  Impedance adaptation for optimal robot–environment interaction , 2014, Int. J. Control.

[21]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[22]  Moshe Cohen,et al.  Learning impedance parameters for robot control using an associative search network , 1991, IEEE Trans. Robotics Autom..

[23]  Neville Hogan,et al.  Impedance control - An approach to manipulation. I - Theory. II - Implementation. III - Applications , 1985 .

[24]  Yuriy S. Shmaliy Linear Time-Varying Systems , 2007 .

[25]  Paul J. Werbos,et al.  Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[26]  John J. Craig,et al.  A systematic method of hybrid position/force control of a manipulator , 1979, COMPSAC.

[27]  Rolf Johansson,et al.  Quadratic optimization of impedance control , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[28]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[29]  Sungchul Kang,et al.  Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Ah-Hwee Tan,et al.  Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback , 2008, IEEE Transactions on Neural Networks.

[31]  Alin Albu-Schäffer,et al.  Human-Like Adaptation of Force and Impedance in Stable and Unstable Interactions , 2011, IEEE Transactions on Robotics.

[32]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[33]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[34]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[35]  Toshio Tsuji,et al.  Neural network learning of robot arm impedance in operational space , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[36]  George G. Lendaris Adaptive dynamic programming approach to experience-based systems identification and control , 2009, Neural Networks.

[37]  Stefano Stramigioli,et al.  Contact impedance estimation for robotic systems , 2005, IEEE Trans. Robotics.

[38]  Wayne J. Book,et al.  Environment estimation for enhanced impedance control , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[39]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[40]  Inna Sharf,et al.  Literature survey of contact dynamics modelling , 2002 .

[41]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .