论文信息 - FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks

FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks

Generative Adversarial Networks (GANs) are among the frontiers of deep networks. GANs consist of two models, a generative model and a discriminative model. While the discriminative model uses the conventional convolution operator, the generative model is fundamentally different per its use of the transposed convolution operator. Unlike the conventional convolution, the transposed convolution initially inserts a large number of zeros in its input. This zero-insertion leads to a large number of inconsequential operations and creates different patterns of computation across the sliding windows. The inconsequential operations along with the variation in computation patterns lead to signicant resource underutilization when evaluated using conventional convolution hardware. This paper introduces FlexiGAN, an end-to-end solution, from high-level GAN specication to an optimized synthesizable FPGA accelerator. FlexiGAN framework is coupled with a novel architecture that aims to harness the benets of both MIMD and SIMD execution models. The proposed architecture separated data retrieval and data processing units at the nest granularity of each compute engine. Leveraging the separation between data retrieval and data processing units in the compute engines, we introduce a succinct set of operations that enable us to signicantly reduce the on-chip memory usage, which is generally scarce in FPGAs. We evaluate our end-to-end solution across various GANs from machine learning literature. FlexiGAN provides 2.4 higher performance than an optimized conventional convolution design. In addition, FlexiGAN, on average, yields 2.8 (up to 3.7) improvements in Performance-per-Watt over a high-end GPU. These results indicate that FlexiGAN is an effective initial step towards providing an end-to-end solution for accelerating GANs

[1] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[2] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[3] Yiannis Demiris,et al. MAGAN: Margin Adaptation for Generative Adversarial Networks , 2017, ArXiv.

[4] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[5] Kiyoshi Tanaka,et al. ArtGAN: Artwork synthesis with conditional categorical GANs , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[6] Masanori Hariyama,et al. FPGA implementation of heterogeneous multicore platform with SIMD/MIMD custom accelerators , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[7] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[8] Yiran Chen,et al. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[9] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[10] Kai Wang,et al. Decoupled affine computation for SIMT GPUs , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11] Yi-Hsuan Yang,et al. MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks , 2017, ArXiv.

[12] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13] Stefano Ermon,et al. Inferring The Latent Structure of Human Decision-Making from Raw Visual Inputs , 2017, NIPS 2017.

[14] Mostapha Benhenda,et al. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? , 2017, ArXiv.

[15] Xinyu Zhang. A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA , 2017, ArXiv.

[16] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[17] Jacob Nelson,et al. SNNAP: Approximate computing on programmable SoCs via neural acceleration , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[18] Su Ruan,et al. Medical Image Synthesis with Context-Aware Generative Adversarial Networks , 2016, MICCAI.

[19] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21] Luis Ceze,et al. General-purpose code acceleration with limited-precision analog computation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[22] Howard Jay Siegel,et al. PASM: A Partitionable SIMD/MIMD System for Image Processing and Pattern Recognition , 1981, IEEE Transactions on Computers.

[23] Victor M. Brea,et al. PRECISION: A Reconfigurable SIMD/MIMD Coprocessor for Computer Vision Systems-on-Chip , 2016, IEEE Transactions on Computers.

[24] Kaiqi Huang,et al. GP-GAN: Towards Realistic High-Resolution Image Blending , 2017, ACM Multimedia.

[25] Sotirios G. Ziavras,et al. Performance-Energy Tradeoffs for Matrix Multiplication on FPGA-Based Mixed-Mode Chip Multiprocessors , 2007, 8th International Symposium on Quality Electronic Design (ISQED'07).

[26] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[27] Miao Hu,et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[28] Hadi Esmaeilzadeh,et al. Neural acceleration for GPU throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[30] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[31] Biswarup Bhattacharya,et al. SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks , 2016, ArXiv.