论文信息 - Design of a Sparsity-Aware Reconfigurable Deep Learning Accelerator Supporting Various Types of Operations

Design of a Sparsity-Aware Reconfigurable Deep Learning Accelerator Supporting Various Types of Operations

The superiority of various Deep Neural Networks (DNN) models, such as Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), and Recurrent Neural Networks (RNN), has been proven in various real-world applications and has received much attention. However, different DNN models include various types of operations. For example, CNN models are usually composed of convolutional layers (Conv) and fully-connected layers (FC). Furthermore, some light CNN models such as MobileNet adopt depthwise separable convolution with depthwise convolution (DWC) and pointwise convolution (PWC) to compress the models. In addition to regular convolution, de-convolution (De-Conv) is also widely used in many GAN models. Moreover, many RNN models also employ long-short-term memory (LSTM) to control update of internal states and data. Such a high diversity of various DNN operations poses great design challenges in implementing reconfigurable Deep Learning (DL) accelerators, which can support various types of DNN operations. Most recent DL accelerators focus only on some DNN operations, which lacks computing flexibility. In this paper, by exploiting the sparsity in current DNN models, we design sparsity-aware DL hardware accelerators that can support efficient computation of various DNN operations, including Conv, DeConv, DWC, PWC, FC, and LSTM. Through reconfiguring dataflow and parallelizing different operations, the proposed designs not only improve system performance but also increase hardware utilization with a significant reduction of power consumption in memory accesses and arithmetic computations.

[1] Shouyi Yin,et al. A fast and power efficient architecture to parallelize LSTM based RNN for cognitive intelligence applications , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[2] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[3] Dongyoung Kim,et al. ZeNA: Zero-Aware Neural Network Accelerator , 2018, IEEE Design & Test.

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[6] Zhongfeng Wang,et al. Accelerating Recurrent Neural Networks: A Memory-Efficient Approach , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7] Leibo Liu,et al. A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications , 2018, IEEE Journal of Solid-State Circuits.

[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[14] Junjie Jia,et al. Unsupervised Representation Learning of Image-Based Plant Disease with Deep Convolutional Generative Adversarial Networks , 2018, 2018 37th Chinese Control Conference (CCC).

[15] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[16] Ling Li,et al. Addressing Sparsity in Deep Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21] Leibo Liu,et al. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[23] Yen-Cheng Kuan,et al. A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things , 2017, IEEE Transactions on Circuits and Systems I: Regular Papers.

[24] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[25] Berin Martini,et al. Embedded Streaming Deep Neural Networks Accelerator With Applications , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[26] Hoi-Jun Yoo,et al. UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision , 2019, IEEE Journal of Solid-State Circuits.

[27] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[28] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Joel Emer,et al. Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[31] Malay Kishore Dutta,et al. Speech emotion recognition with deep learning , 2017, 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN).

[32] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[33] Leibo Liu,et al. GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[35] Luca Benini,et al. DRAM or no-DRAM? Exploring linear solver architectures for image domain warping in 28 nm CMOS , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[36] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[38] Marian Verhelst,et al. An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS , 2017, IEEE Journal of Solid-State Circuits.

[39] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).