论文信息 - Snowflake: An efficient hardware accelerator for convolutional neural networks

Snowflake: An efficient hardware accelerator for convolutional neural networks

Deep learning is becoming increasingly popular for a wide variety of applications including object detection, classification, semantic segmentation and natural language processing. Convolutional neural networks (CNNs) are a type of deep neural network that achieve high accuracy for these tasks. CNNs are hierarchical mathematical models comprising billions of operations to produce an output. The high computational complexity combined with the inherent parallelism in these models makes them an excellent target for custom accelerators. In this work we present Snowflake, a scalable, efficient low-power accelerator that is agnostic to CNN architectures. Our design is able to achieve an average computational efficiency of 91% which is significantly higher than comparable architectures. We implemented Snowflake on a Xilinx Zynq XC7Z045 APSoC. On this platform, Snowflake is capable of achieving 128 G-ops/s while consuming 9.48 W of power. Snowflake achieves a throughput and energy efficiency of 98 frames per second and 10.3 frames per joule, respectively, on AlexNet and 34 frames per second and 3.6 frames per joule on GoogLeNet.

[1] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[2] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[3] Luca Benini,et al. Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[4] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.

[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[7] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[8] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[10] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).