论文信息 - Accelerating CNN Algorithm with Fine-Grained Dataflow Architectures

Accelerating CNN Algorithm with Fine-Grained Dataflow Architectures

Convolutional Neural Network(CNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for CNN. However, accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing CNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme for implementing and optimizing CNN on fine-grained dataflow architecture based accelerators. The experiment results reveal that by using our scheme, the performance of AlexNet running on the dataflow accelerator is 3.11× higher than that on NVIDIA Tesla K80, and the power consumption of our hardware is 8.52× lower than that of K80.

[1] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[3] Zhimin Zhang,et al. An Efficient Network-on-Chip Router for Dataflow Architecture , 2017, Journal of Computer Science and Technology.

[4] Steven Swanson,et al. The WaveScalar architecture , 2007, TOCS.

[5] Lorenzo Verdoscia,et al. A matrix multiplier case study for an evaluation of a configurable dataflow-machine , 2015, Conf. Computing Frontiers.

[6] Doug Burger,et al. TRIPS: A distributed explicit data graph execution (EDGE) microprocessor , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).

[7] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8] Jack B. Dennis,et al. First version of a data flow procedure language , 1974, Symposium on Programming.

[9] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[10] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[11] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[12] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .

[13] Dongrui Fan,et al. SmarCo: An Efficient Many-Core Processor for High-Throughput Applications in Datacenters , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[14] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[15] Dongrui Fan,et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[16] Zhimin Zhang,et al. A Non-Stop Double Buffering Mechanism for Dataflow Architecture , 2017, Journal of Computer Science and Technology.

[17] Dongrui Fan,et al. A Pipelining Loop Optimization Method for Dataflow Architecture , 2017, Journal of Computer Science and Technology.

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[20] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.