论文信息 - Neurodynamics Adaptive Reward and Action for Hand-to-Eye Calibration With Deep Reinforcement Learning

Neurodynamics Adaptive Reward and Action for Hand-to-Eye Calibration With Deep Reinforcement Learning

Calibration performed by a robotic manipulator is crucial in the field of industrial intelligent production, as it ensures precise and accurate measurements. In this paper, we present a new method for addressing the hand-to-eye calibration problem using deep reinforcement learning. Our proposed algorithm utilizes an actor-critic framework and incorporates neurodynamics adaptive reward and action functions, which allows for better convergence, reduces the dependence on the initial value, and overcomes the local convergence issues of traditional deep reinforcement learning method. Additionally, we introduce a step-wise mechanism under the guidance of the attention mechanism, and zero stability to handle the complexity of the calibration task in challenging environments. A number of experiments were conducted to demonstrate the validity of the proposed algorithm. The experimental results show that our proposed algorithm can achieve a nearly 100% success rate after training phase. Additionally, we compared our proposed algorithm with other widely used methods, such as deterministic deep policy gradient (DDPG) and soft actor-critic (SAC) to further demonstrate its effectiveness.

Delu Zeng | Mengfei Yu | Pengfei Guo | Zheng Zheng

[1] Zheng Zheng,et al. From Zeroing Dynamics to Zeroing-Gradient Dynamics for Solving Tracking Control Problem of Robot Manipulator Dynamic System with Linear Output or Nonlinear Output , 2023, Mathematics.

[2] Liangming Chen,et al. Zero Stability Well Predicts Performance of Convolutional Neural Networks , 2022, AAAI.

[3] Qinyong Lin,et al. A Novel Respiratory Follow-Up Robotic System for Thoracic-Abdominal Puncture , 2021, IEEE Transactions on Industrial Electronics.

[4] Qiang Huang,et al. An Integrated Two-Pose Calibration Method for Estimating Head-Eye Parameters of a Robotic Bionic Eye , 2020, IEEE Transactions on Instrumentation and Measurement.

[5] Mehrdad R. Kermani,et al. A New Formulation for Hand–Eye Calibrations as Point-Set Matching , 2020, IEEE Transactions on Instrumentation and Measurement.

[6] Petros Christodoulou,et al. Soft Actor-Critic for Discrete Action Settings , 2019, ArXiv.

[7] Yunong Zhang,et al. Two New Discrete-Time Neurodynamic Algorithms Applied to Online Future Matrix Inversion With Nonsingular or Sometimes-Singular Coefficient , 2019, IEEE Transactions on Cybernetics.

[8] Yunong Zhang,et al. Robust Zeroing Neural-Dynamics and Its Time-Varying Disturbances Suppression Model Applied to Mobile Robot Manipulators , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[9] Nassir Navab,et al. Towards Robotic Eye Surgery: Marker-Free, Online Hand-Eye Calibration Using Optical Coherence Tomography Images , 2018, IEEE Robotics and Automation Letters.

[10] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[11] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[12] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[13] Khalil M. Ahmad Yousef,et al. Solving the robot-world hand-eye(s) calibration problem with iterative methods , 2017, Machine Vision and Applications.

[14] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[15] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[16] Steve Chien,et al. Review on space robotics: Toward top-level science through space exploration , 2017, Science Robotics.

[17] Andrei A. Rusu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[18] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[19] Michael I. Jordan,et al. Trust Region Policy Optimization , 2015, ICML.

[20] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[21] Heping Chen,et al. Accuracy Analysis of Dynamic-Wafer-Handling Robotic System in Semiconductor Manufacturing , 2014, IEEE Transactions on Industrial Electronics.

[22] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23] Jinwoo Jung,et al. Towards closed loop control of a continuum robotic manipulator for medical applications , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24] Zijian Zhao,et al. Hand-eye calibration using convex optimization , 2011, 2011 IEEE International Conference on Robotics and Automation.

[25] Francesco Chinello,et al. KCT: a MATLAB toolbox for motion control of KUKA robot manipulators , 2010, 2010 IEEE International Conference on Robotics and Automation.

[26] Sotiris B. Kotsiantis,et al. Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[27] Stefan Schaal,et al. Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[28] J. Angeles,et al. The online solution of the hand-eye problem , 2000, IEEE Trans. Robotics Autom..

[29] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[30] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31] Kostas Daniilidis,et al. Hand-Eye Calibration Using Dual Quaternions , 1999, Int. J. Robotics Res..

[32] Roberto Ierusalimschy,et al. Lua—An Extensible Extension Language , 1996, Softw. Pract. Exp..

[33] Fadi Dornaika,et al. Hand-Eye Calibration , 1995, Int. J. Robotics Res..

[34] Frank Chongwoo Park,et al. Robot sensor calibration: solving AX=XB on the Euclidean group , 1994, IEEE Trans. Robotics Autom..

[35] Yiu Cheung Shiu,et al. Calibration of wrist-mounted robotic sensors by solving homogeneous transform equations of the form AX=XB , 1989, IEEE Trans. Robotics Autom..

[36] Roger Y. Tsai,et al. A new technique for fully autonomous and efficient 3D robotics hand/eye calibration , 1988, IEEE Trans. Robotics Autom..

[37] R. Bellman. A Markovian Decision Process , 1957 .

[38] R. Shah,et al. Concept for Automated Sorting Robotic Arm , 2018 .

[39] Wim Dewulf,et al. Energy efficient trajectories for an industrial ABB robot , 2014 .

[40] Surya P. N. Singh,et al. V-REP: A versatile and scalable robot simulation framework , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41] Erik Schoenfeld,et al. Door breaching robotic manipulator , 2008, SPIE Defense + Commercial Sensing.

[42] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.