Large space dimension Reinforcement Learning for Robot Position/Force Discrete Control

In this work a large space dimension reinforcement learning (RL) approximation is developed for a Discrete Impedance Position/Force control of robot manipulators that interacts with an unknown environment model. The $Q$-value function is designed in the sense of optimal control theory. The approximator is based on normalized radial basis functions (NRBFs), and are built using the $K$-means clustering algorithm which generates a family of approximators for the $Q$-value function. The RL algorithms learn on-line the optimal impedance model which is equivalent to the desired force without any prior knowledge of the environment dynamics; this feeds a force controller and its output feeds the position controller. Real time experiments are shown using a 2 degree of freedom (DOF) pan and tilt robot and a 6-DOF force/torque (F/T) sensor.

[1]  Xiaoou Li,et al.  PID admittance control for an upper limb exoskeleton , 2011, Proceedings of the 2011 American Control Conference.

[2]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5]  Wen Yu,et al.  Stable admittance control without inverse kinematics , 2017 .

[6]  Nikos A. Aspragathos,et al.  Online Stability in Human-Robot Cooperation with Admittance Control , 2016, IEEE Transactions on Haptics.

[7]  Manfred Huber,et al.  Semi-Unsupervised Clustering Using Reinforcement Learning , 2016, FLAIRS.

[8]  Bruno Siciliano,et al.  A survey of robot interaction control schemes with experimental comparison , 1999 .

[9]  Chang-Soo Han,et al.  Adaptive impedance control for upper limb assist exoskeleton , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Keyvan Hashtrudi-Zaad,et al.  Optimization-based Robot Compliance Control: Geometric and Linear Quadratic Approaches , 2005, Int. J. Robotics Res..

[11]  Robert Babuska,et al.  Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  X. Guo,et al.  K-Means Clustering based Reinforcement Learning Algorithm for Automatic Control in Robots , 2016 .

[13]  Nikolaos Tziortziotis,et al.  Model-based reinforcement learning using on-line clustering , 2012 .

[14]  Frank L. Lewis,et al.  REINFORCEMENT LEARNING AND OPTIMAL ADAPTIVE CONTROL , 2012 .

[15]  Masashi Sugiyama,et al.  Value Function Approximation on Non-Linear Manifolds for Robot Motor Control , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[16]  M. Buss,et al.  Techniques for environment parameter estimation during telemanipulation , 2008, 2008 2nd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[17]  Rolf Johansson,et al.  Quadratic optimization of impedance control , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[18]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation , 1984, 1984 American Control Conference.

[19]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[20]  Karl Johan Åström,et al.  Adaptive Control , 1989, Embedded Digital Control with Microcontrollers.

[21]  Colin Fyfe,et al.  Clustering with Reinforcement Learning , 2007, IDEAL.

[22]  Aristidis Likas,et al.  A Reinforcement Learning Approach to Online Clustering , 1999, Neural Computation.

[23]  Shuzhi Sam Ge,et al.  Optimal Critic Learning for Robot Control in Time-Varying Environments , 2015, IEEE Transactions on Neural Networks and Learning Systems.