ZARA: A Novel Zero-free Dataflow Accelerator for Generative Adversarial Networks in 3D ReRAM

Generative Adversarial Networks (GANs) recently demonstrated a great opportunity toward unsupervised learning with the intention to mitigate the massive human efforts on data labeling in supervised learning algorithms. GAN combines a generative model and a discriminative model to oppose each other in an adversarial situation to refine their abilities. Existing nonvolatile memory based machine learning accelerators, however, could not support the computational needs required by GAN training. Specifically, the generator utilizes a new operator, called transposed convolution, which introduces significant resource underutilization when executed on conventional neural network accelerators as it inserts massive zeros in its input before a convolution operation. In this work, we propose a novel computational deformation technique that synergistically optimizes the forward and backward functions in transposed convolution to eliminate the large resource underutilization. In addition, we present dedicated control units -a dataflow mapper and an operation scheduler, to support the proposed execution model with high parallelism and low energy consumption. ZARA is implemented with commodity ReRAM chips, and experimental results show that our design can improve GAN’s training performance by averagely 1.6 × ~ 23 × over CMOS-based GAN accelerators. Compared to state-of-the-art ReRAM-based accelerator designs, ZARA also provides 1.15 × ~ 2.1 × performance improvement. CCS CONCEPTS • Hardware → Hardware accelerators;

[1]  Yiran Chen,et al.  ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[2]  Tao Li,et al.  Towards Efficient Microarchitectural Design for Accelerating Unsupervised GAN-Based Deep Learning , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[3]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Nam Sung Kim,et al.  FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[5]  Yiran Chen,et al.  RED: A ReRAM-based Deconvolution Accelerator , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Yiran Chen,et al.  PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xinyu Zhang A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA , 2017, ArXiv.

[10]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Nam Sung Kim,et al.  GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[14]  Jiwu Shu,et al.  LerGAN: A Zero-Free, Low Data Movement and PIM-Based GAN Architecture , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[16]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[17]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[18]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[19]  Wujie Wen,et al.  3DICT: A Reliable and QoS Capable Mobile Process-In-Memory Architecture for Lookup-based CNNs in 3D XPoint ReRAMs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[20]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[21]  Bingsheng He,et al.  FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22]  Lei Zhao,et al.  Speeding up crossbar resistive memory by exploiting in-memory data patterns , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23]  Benjamin Graham,et al.  Fractional Max-Pooling , 2014, ArXiv.

[24]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[25]  Kiyoshi Tanaka,et al.  ArtGAN: Artwork synthesis with conditional categorical GANs , 2017, 2017 IEEE International Conference on Image Processing (ICIP).