Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs

The efficacy and effectiveness of Convolutional Neural Networks (CNNs) have been proven in a wide range of machine learning applications. However, the high computational complexity of CNNs presents a critical challenge towards their broader adoption in real-time and power-efficient scenarios. FPGAs are poised to take a significant role for high-performance and energy-efficient computation of CNNs for both mobile (e.g., UAVs, self-driving cars, and IoT devices) and cloud computing domains. However, implementing an effective CNN system onto FPGAs efficiently remains problematic. The current cloud-based FPGAs with unique design constraints and architectural characteristics further increase the challenges. To address these challenges, we propose a novel open-source automated tool chain called Cloud-DNN. Our tool chain takes trained CNN models specified in Caffe as input, performs a set of transformations, and maps the model to a cloud-based FPGA. Cloud-DNN can significantly improve the overall design productivity of CNNs on FPGAs while satisfying the emergent computational requirements. Our design provides an alternative solution compared to other cloud-based options (e.g., GPUs or TPUs) while offering flexible, and high performance DNN inferences. The unique features of Cloud-DNN include the optimizations with cloud-platform characteristics and the support of easier and streamlined implementation. Experimental results demonstrate up to 104.55x performance improvement when compared to CPU implementation and comparable usability, flexibility, and strong quality compared to other state-of-the-art DNN inference implementations on standalone FPGAs.

[1]  Marco D. Santambrogio,et al.  On the Automation of High Level Synthesis of Convolutional Neural Networks , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[2]  Qiuwen Lou,et al.  Design Flow of Accelerating Hybrid Extremely Low Bit-Width Neural Network in Embedded FPGA , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[4]  Jason Cong,et al.  LOPASS: A Low-Power Architectural Synthesis System for FPGAs With Interconnect Estimation and Optimization , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Yu Cao,et al.  Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.

[9]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[10]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Yu Cao,et al.  An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[14]  Yao Chen,et al.  High Level Synthesis of Complex Applications: An H.264 Video Decoder , 2016, FPGA.

[15]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[16]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[18]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[19]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA , 2016, ArXiv.

[20]  Di He,et al.  Machine learning on FPGAs to face the IoT revolution , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[21]  Hongjun Wang,et al.  Real-Time Object Tracking System on FPGAs , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[22]  Shawki Areibi,et al.  Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[23]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Deming Chen,et al.  High-performance video content recognition with long-term recurrent convolutional network for FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[26]  Yun Liang,et al.  High level synthesis of stereo matching: Productivity, performance, and software constraints , 2011, 2011 International Conference on Field-Programmable Technology.

[27]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[28]  Qin Li,et al.  Implementing neural machine translation with bi-directional GRU and attention mechanism on FPGAs using HLS , 2019, ASP-DAC.

[29]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.