MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks

With the increasing size of neural networks, state-of-the-art deep neural networks (DNNs) have hundreds of millions of parameters. Due to multiple fully-connected layers, DNNs are compute-intensive and memory-intensive, making them hard to deploy on embedded devices with limited power budgets and hardware resources. Therefore, this paper presents a deep belief network accelerator based on multi-FPGA. Two different schemes, the division between layers (DBL) and the division inside layers (DIL), are adopted to map the DBN to the multi-FPGA system. Experimental results demonstrate that the accelerator can achieve 4.24x (DBL) -6.20x (DIL) speedup comparing to the Intel Core i7 CPU and save 119x (DBL) -90x (DIL) power consumption comparing to the Tesla K40C GPU.

[1]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[2]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[3]  Yangyang Zhao,et al.  PIE: A Pipeline Energy-Efficient Accelerator for Inference Process in Deep Neural Networks , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[4]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[5]  Chao Wang,et al.  A Deep Learning Prediction Process Accelerator Based FPGA , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[6]  Qi Yu,et al.  DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.