GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

Generative networks have become ubiquitous in image generation applications like image super-resolution, image to image translation, and text to image synthesis. They are usually composed of convolutional (CONV) layers, convolution-based residual blocks, and deconvolutional (DeCONV) layers. Previous works on neural network acceleration focus too much on optimizing CONV layers computation such as data-reuse or parallel computation, but have low processing element (PE) utilization in computing residual blocks and DeCONV layers: residual blocks require very high memory bandwidth when performing elementwise additions on residual paths; DeCONV layers have imbalanced operation counts for different outputs. In this paper, we propose a dual convolution mapping method for CONV and DeCONV layers to make full use of the available PE resources. A cross-layer scheduling method is also proposed to avoid extra off-chip memory access in residual block processing. Precision-adaptive PEs and buffer bandwidth reconfiguration are used to support flexible bitwidths for both inputs and weights in deep neural networks. We implement a generative network accelerator (GNA) based on intra-PE processing, inter-PE processing, and cross-layer scheduling techniques. Owing to the proposed optimization techniques, GNA achieves energy efficiency of 2.05 TOPS/W with 61% higher PE utilization than traditional methods in generative network acceleration.

[1]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[4]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[6]  Ardavan Pedram,et al.  CATERPILLAR: Coarse Grain Reconfigurable Architecture for accelerating the training of Deep Neural Networks , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[7]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[8]  Gert Cauwenberghs,et al.  Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks , 2017, Front. Neurosci..

[9]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[11]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Xiaowei Li,et al.  FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  C. John Glossner,et al.  A subworld-parallel multiplication and sum-of-squares unit , 2004, IEEE Computer Society Annual Symposium on VLSI.

[14]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[15]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[16]  Marian Verhelst,et al.  14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[17]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[19]  Christoforos E. Kozyrakis,et al.  Towards energy-proportional datacenter memory with mobile DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[20]  Marian Verhelst,et al.  DVAFS: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[21]  Minxuan Zhang,et al.  Multiple-precision subword-parallel multiplier using correction-value merging technique , 2007, 2007 7th International Conference on ASIC.

[22]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[23]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.