PIR-DSP: An FPGA DSP Block Architecture for Multi-precision Deep Neural Networks
暂无分享,去创建一个
Lingli Wang | Hao Zhou | Philip H. W. Leong | Philip H.W. Leong | SeyedRamin Rasoulinezhad | Lingli Wang | Seyedramin Rasoulinezhad | Hao Zhou
[1] Eriko Nurvitadhi,et al. High performance binary neural networks on the Xeon+FPGA™ platform , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[2] Alak Majumder,et al. A Variation-Aware Robust Gated Flip-Flop for Power-Constrained FSM Application , 2019, J. Circuits Syst. Comput..
[3] Eriko Nurvitadhi,et al. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).
[4] Andrew C. Ling,et al. An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.
[5] Xuegong Zhou,et al. Accelerating low bit-width convolutional neural networks with embedded FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[6] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[7] Stratix II Device Handbook, Volume 1 , 2006 .
[8] Shengen Yan,et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[9] Vaughn Betz,et al. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[10] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Vaughn Betz,et al. Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[12] Farinaz Koushanfar,et al. Customizing Neural Networks for Efficient FPGA Implementation , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[13] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[15] Bruce A. Wooley,et al. A Two's Complement Parallel Array Multiplication Algorithm , 1973, IEEE Transactions on Computers.
[16] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[18] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[19] Magnus Själander,et al. Multiplication Acceleration Through Twin Precision , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[20] Paolo Ienne,et al. Highly Versatile DSP Blocks for Improved FPGA Arithmetic Performance , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.
[21] Yu Wang,et al. A Survey of FPGA-Based Neural Network Accelerator , 2017, 1712.08934.
[22] Mansun Chan,et al. A 65nm 3.2GHz 44.2mW Low-Vt register file with robust low-capacitance dynamic local bitlines , 2015, ESSCIRC Conference 2015 - 41st European Solid-State Circuits Conference (ESSCIRC).
[23] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[25] Hiroki Nakahara,et al. A fully connected layer elimination for a binarizec convolutional neural network on an FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[26] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[27] Andrew C. Ling,et al. An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .
[28] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.
[30] Aaron Stillmaker,et al. Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm , 2017, Integr..
[31] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[32] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[33] David Boland,et al. Customizing Low-Precision Deep Neural Networks for FPGAs , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[34] Minxuan Zhang,et al. A Dynamic Multi-precision Fixed-Point Data Quantization Strategy for Convolutional Neural Network , 2016, NCCET.
[35] Martin Margala,et al. Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[36] R. Krishnamurthy,et al. An 8.8GHz 198mW 16x64b 1R/1W variationtolerant register file in 65nm CMOS , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.
[37] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[38] Xuegong Zhou,et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[39] Viktor Prasanna,et al. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System , 2017, FPGA.
[40] Philip Heng Wai Leong,et al. SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[42] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[43] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[44] William J. Dally,et al. SLIP: Reducing wire energy in the memory hierarchy , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[45] Yoshua Bengio,et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.
[46] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[47] Jason Cong,et al. FPGA-based accelerator for long short-term memory recurrent neural networks , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).
[48] Vivienne Sze,et al. Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks , 2018, ArXiv.
[49] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[50] Nicholas Caldwell,et al. Scalable high-performance architecture for convolutional ternary neural networks on FPGA , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[51] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[52] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Ji-Zhong Shen,et al. Low-power level converting flip-flop with a conditional clock technique in dual supply systems , 2014, Microelectron. J..