Accelerating Training of Deep Neural Networks via Sparse Edge Processing

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for different bit widths.

[1]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[2]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[3]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6]  Brad Hutchings,et al.  RRANN: a hardware implementation of the backpropagation algorithm using reconfigurable FPGAs , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7]  Andreas G. Andreou,et al.  FPGA implementation of a Deep Belief Network architecture for character recognition using stochastic computation , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).

[8]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[11]  Byungik Ahn,et al.  Computation of deep belief networks using special-purpose hardware architecture , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[12]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[13]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[14]  Shujun Liu,et al.  Deep Adaptive Network: An Efficient Deep Neural Network with Sparse Binary Connections , 2016, ArXiv.

[15]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[16]  Francisco José Ballester-Merelo,et al.  Artificial neural network implementation on a single FPGA of a pipelined on-line backpropagation , 2000, Proceedings 13th International Symposium on System Synthesis.

[17]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[19]  S. Himavathi,et al.  Feedforward Neural Network Implementation in FPGA Using Layer Multiplexing for Effective Resource Utilization , 2007, IEEE Transactions on Neural Networks.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.