Low-Dimensional Learning for Complex Robots

This paper presents an algorithm for learning the switching policy and the boundaries conditions between primitive controllers that maximize the translational movements of a complex locomoting system. The algorithm learns an optimal action for each boundary condition instead of one for each discretized state-action pair of the system, as is typically done in machine learning. The system is modeled as a hybrid system because it contains both discrete and continuous dynamics. With this hybridification of the system and with this abstraction of learning boundary-action pairs, the “curse of dimensionality” is mitigated. The effectiveness of this learning algorithm is demonstrated on both a simulated system and on a physical robotic system. In both cases, the algorithm is able to learn the hybrid control strategy that maximizes the forward translational movement of the system without the need for human involvement.

[1]  S. Sastry Nonlinear Systems: Analysis, Stability, and Control , 1999 .

[2]  Dong Wei,et al.  An implementation of iterative learning control in industrial production machines , 2008, 2008 IEEE International Conference on Automation Science and Engineering.

[3]  R. W. Brockett,et al.  Asymptotic stability and feedback stabilization , 1982 .

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  S. Islam,et al.  Adaptive iterative learning control for robot manipulators without using velocity signals , 2010, 2010 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[7]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[8]  E. Bizzi,et al.  Linear combinations of primitives in vertebrate motor control. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[9]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[10]  Madan M. Gupta,et al.  An adaptive switching learning control method for trajectory tracking of robot manipulators , 2006 .

[11]  Derong Liu,et al.  Model-Free Adaptive Dynamic Programming for Optimal Control of Discrete-Time Ane Nonlinear System , 2014 .

[12]  YangQuan Chen,et al.  Iterative Learning Control: A Tutorial and Big Picture View , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[13]  Thomas Parisini,et al.  Can we cope with the curse of dimensionality in optimal control by using neural approximators? , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[14]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[15]  Stefan Schaal,et al.  Learning motion primitive goals for robust manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Frances Y. Kuo,et al.  Lifting the Curse of Dimensionality , 2005 .

[17]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[18]  P. R. Ouyang,et al.  A novel hybridization design principle for intelligent mechatronics systems , 2010 .

[19]  S. Shankar Sastry,et al.  Learning Controllers for Complex Behavioral Systems. , 1996, NIPS 1996.

[20]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[21]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[22]  Zhongsheng Hou,et al.  Coordinated Iterative Learning Control Schemes for Train Trajectory Tracking With Overspeed Protection , 2013, IEEE Transactions on Automation Science and Engineering.

[23]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[24]  Arpit A. Almal,et al.  Lifting the Curse of Dimensionality , 2007 .

[25]  J. Zico Kolter,et al.  Design, analysis, and learning control of a fully actuated micro wind turbine , 2012, 2012 American Control Conference (ACC).

[26]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[27]  R. Sanfelice,et al.  Hybrid dynamical systems , 2009, IEEE Control Systems.

[28]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[29]  Reza Shadmehr,et al.  Learning of action through adaptive combination of motor primitives , 2000, Nature.

[30]  Randall D. Beer,et al.  Biologically inspired approaches to robotics: what can we learn from insects? , 1997, CACM.

[31]  Peter Stone,et al.  Generalized model learning for Reinforcement Learning on a humanoid robot , 2010, 2010 IEEE International Conference on Robotics and Automation.

[32]  P. R. Ouyang,et al.  Iterative Learning Control: A Comparison Study , 2010 .

[33]  Qinglai Wei,et al.  A Novel Iterative $\theta $-Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems , 2014, IEEE Transactions on Automation Science and Engineering.