论文信息 - GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

Generative networks have become ubiquitous in image generation applications like image super-resolution, image to image translation, and text to image synthesis. They are usually composed of convolutional (CONV) layers, convolution-based residual blocks, and deconvolutional (DeCONV) layers. Previous works on neural network acceleration focus too much on optimizing CONV layers computation such as data-reuse or parallel computation, but have low processing element (PE) utilization in computing residual blocks and DeCONV layers: residual blocks require very high memory bandwidth when performing elementwise additions on residual paths; DeCONV layers have imbalanced operation counts for different outputs. In this paper, we propose a dual convolution mapping method for CONV and DeCONV layers to make full use of the available PE resources. A cross-layer scheduling method is also proposed to avoid extra off-chip memory access in residual block processing. Precision-adaptive PEs and buffer bandwidth reconfiguration are used to support flexible bitwidths for both inputs and weights in deep neural networks. We implement a generative network accelerator (GNA) based on intra-PE processing, inter-PE processing, and cross-layer scheduling techniques. Owing to the proposed optimization techniques, GNA achieves energy efficiency of 2.05 TOPS/W with 61% higher PE utilization than traditional methods in generative network acceleration.

[1] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[4] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[6] Ardavan Pedram,et al. CATERPILLAR: Coarse Grain Reconfigurable Architecture for accelerating the training of Deep Neural Networks , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[7] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[8] Gert Cauwenberghs,et al. Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks , 2017, Front. Neurosci..

[9] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[11] Sudhakar Yalamanchili,et al. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13] C. John Glossner,et al. A subworld-parallel multiplication and sum-of-squares unit , 2004, IEEE Computer Society Annual Symposium on VLSI.

[14] Francesco Visin,et al. A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[15] David Berthelot,et al. BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[16] Marian Verhelst,et al. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[17] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[19] Christoforos E. Kozyrakis,et al. Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[20] Marian Verhelst,et al. DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[21] Minxuan Zhang,et al. Multiple-precision subword-parallel multiplier using correction-value merging technique , 2007, 2007 7th International Conference on ASIC.

[22] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[23] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.