Bactran: A Hardware Batch Normalization Implementation for CNN Training Engine

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jung Ho Ahn,et al.  Restructuring Batch Normalization to Accelerate CNN Training , 2018, SysML.

[4]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[5]  Lei Wang,et al.  Systolic Array Based Accelerator and Algorithm Mapping for Deep Learning Algorithms , 2018, NPC.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wayne Luk,et al.  Towards efficient deep neural network training by FPGA-based batch-level parallelism , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[8]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[9]  Ge Li,et al.  Mini-batch Serialization: CNN Training with Inter-layer Data Reuse , 2018, MLSys.

[10]  Tomyslav Sledevic,et al.  Adaptation of Convolution and Batch Normalization Layer for CNN Implementation on FPGA , 2019, 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream).

[11]  Elad Hoffer,et al.  Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.

[12]  Dustin Tran,et al.  Simple, Distributed, and Accelerated Probabilistic Programming , 2018, NeurIPS.

[13]  Yong Dou,et al.  An FPGA-based processor for training convolutional neural networks , 2017, 2017 International Conference on Field Programmable Technology (ICFPT).

[14]  Lei Wang,et al.  PRTSM: Hardware Data Arrangement Mechanisms for Convolutional Layer Computation on the Systolic Array , 2019, NPC.

[15]  Matthew Mattina,et al.  SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.