REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs
暂无分享,去创建一个
Yun Liang | Yanzhi Wang | Ning Liu | Kaidi Xu | Shuo Wang | Caiwen Ding | Yun Liang | Yanzhi Wang | Caiwen Ding | Ning Liu | Kaidi Xu | Shuo Wang
[1] Philip Heng Wai Leong,et al. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.
[2] Zhiyong Gao,et al. Hardware Implementation and Optimization of Tiny-YOLO Network , 2017, IFTC.
[3] Wayne Luk,et al. Comparing performance and energy efficiency of FPGAs and GPUs for high productivity computing , 2010, 2010 International Conference on Field-Programmable Technology.
[4] Yun Liang,et al. SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[6] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Keshab K. Parhi,et al. An efficient pipelined FFT architecture , 2003 .
[9] Yann LeCun,et al. Deep learning & convolutional networks , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[10] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[11] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[12] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Yun Liang,et al. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[14] Hiroki Nakahara,et al. A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an FPGA , 2018, FPGA.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Rogério Schmidt Feris,et al. A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.
[17] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[18] Wei Zhang,et al. FlexCL: An analytical performance model for OpenCL workloads on flexible FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[19] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[20] Yanzhi Wang,et al. Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank , 2017, ICML.
[21] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Sachin S. Talathi,et al. Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.
[23] Qinru Qiu,et al. C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs , 2018, FPGA.
[24] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[25] Peng Zhang,et al. TGPA: Tile-Grained Pipeline Architecture for Low Latency CNN Inference , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[26] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[27] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Rong Jin,et al. Deep Learning at Alibaba , 2017, IJCAI.
[29] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[30] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.
[31] Smith,et al. Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications , 2007 .
[32] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.
[33] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[34] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Keshab K. Parhi,et al. Pipelined Architectures for Real-Valued FFT and Hermitian-Symmetric IFFT With Real Datapaths , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.
[36] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[37] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[38] Song Han,et al. From model to FPGA: Software-hardware co-design for efficient neural network acceleration , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).
[39] Farid Melgani,et al. Convolutional SVM Networks for Object Detection in UAV Imagery , 2018, IEEE Transactions on Geoscience and Remote Sensing.
[40] Shengen Yan,et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[41] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[42] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .
[43] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[44] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[46] Yun Liang,et al. High-Level Synthesis: Productivity, Performance, and Software Constraints , 2012, J. Electr. Comput. Eng..
[47] V. Pan. Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .
[48] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[49] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[50] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.