Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA

This letter explores a deep reinforcement learning (DRL) approach for designing image-based control for edge robots to be implemented on Field Programmable Gate Arrays (FPGAs). Although FPGAs are more power-efficient than CPUs and GPUs, a typical DRL method cannot be applied since they are composed of many Logic Blocks (LBs) for high-speed logical operations but low-speed real-number operations. To cope with this problem, we propose a novel DRL algorithm called Binarized P-Network (BPN), which learns image-input control policies using Binarized Convolutional Neural Networks (BCNNs). To alleviate the instability of reinforcement learning caused by a BCNN with low function approximation accuracy, our BPN adopts a robust value update scheme called Conservative Value Iteration, which is tolerant of function approximation errors. We confirmed the BPN's effectiveness through applications to a visual tracking task in simulation and real-robot experiments with FPGA.

[1]  Hiroki Nakahara,et al.  On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[2]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[3]  Jaejin Lee,et al.  FA3C: FPGA-Accelerated Deep Reinforcement Learning , 2019, ASPLOS.

[4]  Hiroki Matsutani,et al.  An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Roland Siegwart,et al.  Omnidirectional visual obstacle detection using embedded FPGA , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[9]  Hilbert J. Kappen,et al.  Dynamic policy programming , 2010, J. Mach. Learn. Res..

[10]  Alexis Asseman,et al.  Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems , 2021, ACM J. Emerg. Technol. Comput. Syst..

[11]  David B. Thomas,et al.  Neural Network Based Reinforcement Learning Acceleration on FPGA Platforms , 2017, CARN.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Alen Mujkic,et al.  Hexapod Robot Navigation Using FPGA Based Controller , 2019 .

[14]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[15]  Qinru Qiu,et al.  Fast and Accurate Trajectory Tracking for Unmanned Aerial Vehicles based on Deep Reinforcement Learning , 2019, 2019 IEEE 25th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[16]  Yifan Li,et al.  Accelerating deep reinforcement learning model for game strategy , 2020, Neurocomputing.

[17]  Wayne Luk,et al.  Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[18]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[19]  Tinoosh Mohsenin,et al.  An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning , 2021, IEEE Open Journal of Circuits and Systems.

[20]  Vijay Kumar,et al.  The Open Vision Computer: An Integrated Sensing and Compute System for Mobile Robots , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Ling Liu,et al.  HERO: Accelerating Autonomous Robotic Tasks with FPGA , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Kenji Doya,et al.  Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning , 2019, AISTATS.

[23]  Takamitsu Matsubara,et al.  Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation , 2019, Robotics Auton. Syst..