Pneumatic artificial muscle-driven robot control using local update reinforcement learning

Graphical Abstract In this study, a new value function based Reinforcement learning (RL) algorithm, Local Update Dynamic Policy Programming (LUDPP), is proposed. It exploits the nature of smooth policy update using Kullback–Leibler divergence to update its value function locally and considerably reduces the computational complexity. We firstly investigated the learning performance of LUDPP and other algorithms without smooth policy update for tasks of pendulum swing up and n DOFs manipulator reaching in simulation. Only LUDPP could efficiently and stably learn good control policies in high dimensional systems with limited number of training samples. In real word application, we applied LUDPP to control Pneumatic Artificial Muscles (PAMs) driven robots without the knowledge of model which is challenging for traditional methods due to the high nonlinearities of PAM’s air pressure dynamics and mechanical structure. LUDPP successfully achieved one finger control of Shadow Dexterous Hand, a PAM-driven humanoid robot hand, with far lower computational resource compared with other conventional value function based RL algorithms.

[1]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[2]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[3]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[4]  Blake Hannaford,et al.  Measurement and modeling of McKibben pneumatic artificial muscles , 1996, IEEE Trans. Robotics Autom..

[5]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[6]  Dirk Lefeber,et al.  Pneumatic artificial muscles: Actuators for robotics and automation , 2002 .

[7]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[8]  Nikolaos G. Tsagarakis,et al.  Development and Control of a ‘Soft-Actuated’ Exoskeleton for Use in Physiotherapy and Training , 2003, Auton. Robots.

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[11]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[12]  G.S. Sawicki,et al.  Powered lower limb orthoses: applications in motor adaptation and rehabilitation , 2005, 9th International Conference on Rehabilitation Robotics, 2005. ICORR 2005..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[15]  Andrew Y. Ng,et al.  Fast Gaussian Process Regression using KD-Trees , 2005, NIPS.

[16]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[17]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[18]  Auke Jan Ijspeert,et al.  Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[19]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[20]  Lihong Li,et al.  Online exploration in least-squares policy iteration , 2009, AAMAS.

[21]  Leandro dos Santos Coelho,et al.  Model-free adaptive control design using evolutionary-neural compensator , 2010, Expert Syst. Appl..

[22]  Bart De Schutter,et al.  Online least-squares policy iteration for reinforcement learning control , 2010, Proceedings of the 2010 American Control Conference.

[23]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[24]  Csaba Szepesvári,et al.  Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.

[25]  Stefan Schaal,et al.  Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[26]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[27]  Sylvain Calloch,et al.  Experimental comparison of classical PID and model-free control: Position control of a shape memory alloy active spring , 2011 .

[28]  Vicenç Gómez,et al.  Dynamic Policy Programming with Function Approximation , 2011, AISTATS.

[29]  Kenji Sugimoto,et al.  Identification procedure for McKibben pneumatic artificial muscle systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Hilbert J. Kappen,et al.  Dynamic policy programming , 2010, J. Mach. Learn. Res..

[31]  Vicente Milanés Montero,et al.  Low-Speed Longitudinal Controllers for Mass-Produced Cars: A Comparative Study , 2012, IEEE Transactions on Industrial Electronics.

[32]  Cédric Join,et al.  Model-free control , 2013, Int. J. Control.

[33]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Takamitsu Matsubara,et al.  Local Update Dynamic Policy Programming in reinforcement learning of pneumatic artificial muscle-driven humanoid hand control , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).