Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.

[1]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[2]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[3]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Federico Baronti,et al.  An FPGA-based controller for collaborative robotics , 2017, 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE).

[7]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[10]  David B. Thomas,et al.  Neural Network Based Reinforcement Learning Acceleration on FPGA Platforms , 2017, CARN.

[11]  Wayne Luk,et al.  Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).