FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates

DNNs (Deep Neural Networks) have demonstrated great success in numerous applications such as image classification, speech recognition, video analysis, etc. However, DNNs are much more computation-intensive and memory-intensive than previous shallow models. Thus, it is challenging to deploy DNNs in both large-scale data centers and real-time embedded systems. Considering performance, flexibility, and energy efficiency, FPGA-based accelerator for DNNs is a promising solution. Unfortunately, conventional accelerator design flows make it difficult for FPGA developers to keep up with the fast pace of innovations in DNNs. To overcome this problem, we propose FP-DNN (Field Programmable DNN), an end-to-end framework that takes TensorFlow-described DNNs as input, and automatically generates the hardware implementations on FPGA boards with RTL-HLS hybrid templates. FP-DNN performs model inference of DNNs with our high-performance computation engine and carefully-designed communication optimization strategies. We implement CNNs, LSTM-RNNs, and Residual Nets with FPDNN, and experimental results show the great performance and flexibility provided by our proposed FP-DNN framework.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[3]  Geoffrey E. Hinton,et al.  Application of Deep Belief Networks for Natural Language Understanding , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jie Xu,et al.  DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[9]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Hadi Esmaeilzadeh,et al.  TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[12]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[16]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[17]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[21]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[23]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).