Customizing Neural Networks for Efficient FPGA Implementation

We propose a novel end-to-end framework to customize execution of deep neural networks on FPGA platforms. Our framework employs a reconfigurable clustering approach that encodes the parameters of deep neural networks in accordance with the application's accuracy requirement and the underlying platform constraints. The throughput of FPGA-based realizations of neural networks is often bounded by the memory access bandwidth. The use of encoded parameters reduces both the required memory bandwidth and the computational complexity of neural networks, increasing the effective throughput. Our framework enables systematic customization of encoded deep neural networks for different FPGA platforms. Proof-of-concept evaluations on four different applications demonstrate up to 9-fold reduction in memory footprint and 15-fold improvement in the operational throughput while the drop in accuracy remains below 0.1%.

[1]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[2]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[3]  K Kasikumar,et al.  Applications of Data Mining Techniques in Healthcare and Prediction of Heart Attacks , 2018, International Journal of Data Mining Techniques and Applications.

[4]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[5]  Farinaz Koushanfar,et al.  DeLight: Adding Energy Dimension To Deep Neural Networks , 2016, ISLPED.

[6]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[9]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[10]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[11]  Nicholas D. Lane,et al.  Can Deep Learning Revolutionize Mobile Sensing? , 2015, HotMobile.

[12]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[13]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[14]  Sachin S. Talathi,et al.  Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks , 2016, ArXiv.

[15]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[18]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[19]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[20]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[24]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Yoshua Bengio,et al.  Low precision storage for deep learning , 2014 .

[26]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[27]  Farinaz Koushanfar,et al.  LookNN: Neural network with no multiplication , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[28]  Farinaz Koushanfar,et al.  Deep3: Leveraging three levels of parallelism for efficient Deep Learning , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).