Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge

Time-optimal path tracking, as a significant tool for industrial robots, has attracted the attention of numerous researchers. In most time-optimal path tracking problems, the actuator torque constraints are assumed to be conservative, which ignores the motor characteristic; i.e., the actuator torque constraints are velocity-dependent, and the relationship between torque and velocity is piecewise linear. However, considering that the motor characteristics increase the solving difficulty, in this study, an improved Q-learning algorithm for robotic time-optimal path tracking using prior knowledge is proposed. After considering the limitations of the Q-learning algorithm, an improved action-value function is proposed to improve the convergence rate. The proposed algorithms use the idea of reward and penalty, rewarding the actions that satisfy constraint conditions and penalizing the actions that break constraint conditions, to finally obtain a time-optimal trajectory that satisfies the constraint conditions. The effectiveness of the algorithms is verified by experiments.

[1]  Amit Konar,et al.  A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Qiang Zhang,et al.  Time-optimal path tracking for robots under dynamics constraints based on convex optimization , 2015, Robotica.

[3]  Kang G. Shin,et al.  Minimum-time control of robotic manipulators with geometric path constraints , 1985 .

[4]  Cristóvão D. Sousa,et al.  Dynamic model identification of robot manipulators: Solving the physical feasibility problem , 2015 .

[5]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[6]  Amos Azaria,et al.  Deep Reinforcement Learning for Time Optimal Velocity Control using Prior Knowledge , 2018, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[7]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[8]  Andreas Müller,et al.  Nearly Optimal Path Following With Jerk and Torque Rate Limits Using Dynamic Programming , 2019, IEEE Transactions on Robotics.

[9]  Bernard Roth,et al.  The Near-Minimum-Time Control Of Open-Loop Articulated Kinematic Chains , 1971 .

[10]  Goele Pipeleers,et al.  Time-Optimal Path Following for Robots With Convex–Concave Constraints Using Sequential Convex Programming , 2013, IEEE Transactions on Robotics.

[11]  John J. Craig Zhu,et al.  Introduction to robotics mechanics and control , 1991 .

[12]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[13]  Friedrich Pfeiffer,et al.  A concept for manipulator trajectory planning , 1987, IEEE J. Robotics Autom..

[14]  Kemal Leblebicioglu,et al.  Free gait generation with reinforcement learning for a six-legged robot , 2008, Robotics Auton. Syst..

[15]  J. Bobrow,et al.  Time-Optimal Control of Robotic Manipulators Along Specified Paths , 1985 .

[16]  Z. Shiller,et al.  Computation of Path Constrained Time Optimal Motions With Dynamic Singularities , 1992 .

[17]  Jan Swevers,et al.  Time-Optimal Path Tracking for Robots: A Convex Optimization Approach , 2009, IEEE Transactions on Automatic Control.

[18]  Stefan Wermter,et al.  Real-world reinforcement learning for autonomous humanoid robot docking , 2012, Robotics Auton. Syst..

[19]  Senén Barro,et al.  Making Use of Unelaborated Advice to Improve Reinforcement Learning: A Mobile Robotics Approach , 2005, ICAPR.

[20]  Pauline Ong,et al.  Solving the optimal path planning of a mobile robot using improved Q-learning , 2019, Robotics Auton. Syst..

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Quang-Cuong Pham,et al.  A General, Fast, and Robust Implementation of the Time-Optimal Path Parameterization Algorithm , 2013, IEEE Transactions on Robotics.

[23]  J. Mixter Fast , 2012 .

[24]  Jan Swevers,et al.  On-line time-optimal path tracking for robots , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25]  Jean-Jacques E. Slotine,et al.  Improving the Efficiency of Time-Optimal Path-Following Algorithms , 1988, 1988 American Control Conference.

[26]  Jan Swevers,et al.  An Efficient Iterative Learning Approach to Time-Optimal Path Tracking for Industrial Robots , 2018, IEEE Transactions on Industrial Informatics.

[27]  E. Croft,et al.  Smooth and time-optimal trajectory planning for industrial manipulators along specified paths , 2000 .

[28]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[29]  N. McKay,et al.  A dynamic programming approach to trajectory planning of robotic manipulators , 1986 .

[30]  Tie Zhang,et al.  Time-optimal path tracking for robots a numerical integration-like approach combined with an iterative learning algorithm , 2019, Ind. Robot.

[31]  Rida T. Farouki,et al.  Algorithms for time-optimal control of CNC machines along curved tool paths , 2005 .