Methodology for Efficient Reconfigurable Architecture of Generative Neural Network

Generative neural networks have been developing rapidly in the field of deep learning nowadays. Generative models have obtained much popularity in various applications such as image generation, reading comprehension and style transfer. Convolutional (CONV) and deconvolutional (DeCONV) layers are typical components of generative neural networks. The use of traditional convolution accelerators will cause problems of overlapping and resource under-utilization while doing deconvolutions. There is little research on acceleration of deconvolution implementations. In this paper, we propose efficient reconfigurable architecture of generative neural networks. Firstly, the fast reconfigurable unit (FRU) based on cascaded fast FIR algorithm (CFFA) is proposed to support both convolutions and deconvolutions. The problems of overlapping and resource under-utilization are solved. Secondly, the reconfigurable architecture on the basis of FRUs for CONV and DeCONV layers is proposed accordingly. Thirdly, a novel shift scale quantization method is proposed to uniformly quantize CONV and DeCONV layers. Only integer computations are required with the quantization method. Finally, we choose a typical generative neural network and implement it on Xilinx Zynq ZC706. It is estimated that the performance reaches 62.85 GOPS under 330MHz working frequency on Xilinx ZC706. In brief, the proposed design outperforms existing works significantly, particularly surpasses related reconfigurable design by more than 20 times in terms of performance density.

[1]  Keshab K. Parhi,et al.  Hardware efficient fast parallel FIR filter structures based on iterated short convolution , 2004, IEEE Trans. Circuits Syst. I Regul. Pap..

[2]  Leibo Liu,et al.  GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[5]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[7]  Chi-Keung Tang,et al.  Conditional CycleGAN for Attribute Guided Face Image Generation , 2017, ArXiv.

[8]  Hui Jiang,et al.  Generating images with recurrent adversarial networks , 2016, ArXiv.

[9]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Song Han,et al.  Efficient Sparse-Winograd Convolutional Neural Networks , 2018, ICLR.

[12]  Xinyu Zhang A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA , 2017, ArXiv.

[13]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[15]  Leibo Liu,et al.  Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Thomas Brox,et al.  Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[17]  Keshab K. Parhi,et al.  Area-efficient parallel FIR digital filter implementations , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[18]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luca Benini,et al.  YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[20]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21]  Zhongfeng Wang,et al.  Efficient Hardware Architectures for Deep Convolutional Neural Network , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[22]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[24]  Vivienne Sze,et al.  Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks , 2018, ArXiv.

[25]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[26]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[27]  Yun Liang,et al.  SpWA: an efficient sparse winograd convolutional neural networks accelerator on FPGAs , 2018, DAC.

[28]  Luca Rigazio,et al.  ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks , 2017, ArXiv.

[29]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.