论文信息 - Large space dimension Reinforcement Learning for Robot Position/Force Discrete Control

Large space dimension Reinforcement Learning for Robot Position/Force Discrete Control

In this work a large space dimension reinforcement learning (RL) approximation is developed for a Discrete Impedance Position/Force control of robot manipulators that interacts with an unknown environment model. The $Q$-value function is designed in the sense of optimal control theory. The approximator is based on normalized radial basis functions (NRBFs), and are built using the $K$-means clustering algorithm which generates a family of approximators for the $Q$-value function. The RL algorithms learn on-line the optimal impedance model which is equivalent to the desired force without any prior knowledge of the environment dynamics; this feeds a force controller and its output feeds the position controller. Real time experiments are shown using a 2 degree of freedom (DOF) pan and tilt robot and a 6-DOF force/torque (F/T) sensor.

[1] Xiaoou Li,et al. PID admittance control for an upper limb exoskeleton , 2011, Proceedings of the 2011 American Control Conference.

[2] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5] Wen Yu,et al. Stable admittance control without inverse kinematics , 2017 .

[6] Nikos A. Aspragathos,et al. Online Stability in Human-Robot Cooperation with Admittance Control , 2016, IEEE Transactions on Haptics.

[7] Manfred Huber,et al. Semi-Unsupervised Clustering Using Reinforcement Learning , 2016, FLAIRS.

[8] Bruno Siciliano,et al. A survey of robot interaction control schemes with experimental comparison , 1999 .

[9] Chang-Soo Han,et al. Adaptive impedance control for upper limb assist exoskeleton , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[10] Keyvan Hashtrudi-Zaad,et al. Optimization-based Robot Compliance Control: Geometric and Linear Quadratic Approaches , 2005, Int. J. Robotics Res..

[11] Robert Babuska,et al. Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12] X. Guo,et al. K-Means Clustering based Reinforcement Learning Algorithm for Automatic Control in Robots , 2016 .

[13] Nikolaos Tziortziotis,et al. Model-based reinforcement learning using on-line clustering , 2012 .

[14] Frank L. Lewis,et al. REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL , 2012 .

[15] Masashi Sugiyama,et al. Value Function Approximation on Non-Linear Manifolds for Robot Motor Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[16] M. Buss,et al. Techniques for environment parameter estimation during telemanipulation , 2008, 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[17] Rolf Johansson,et al. Quadratic optimization of impedance control , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[18] Neville Hogan,et al. Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[19] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[20] Karl Johan Åström,et al. Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[21] Colin Fyfe,et al. Clustering with Reinforcement Learning , 2007, IDEAL.

[22] Aristidis Likas,et al. A Reinforcement Learning Approach to Online Clustering , 1999, Neural Computation.

[23] Shuzhi Sam Ge,et al. Optimal Critic Learning for Robot Control in Time-Varying Environments , 2015, IEEE Transactions on Neural Networks and Learning Systems.