Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
暂无分享,去创建一个
Zelong Wang | Junzhong Shen | You Huang | Yuran Qiao | Mei Wen | Chunyuan Zhang | M. Wen | Chunyuan Zhang | Y. Huang | Zelong Wang | Junzhong Shen | Yuran Qiao
[1] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[4] Hadi Esmaeilzadeh,et al. TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[5] Konstantinos Kamnitsas,et al. Efficient multi‐scale 3D CNN with fully connected CRF for accurate brain lesion segmentation , 2016, Medical Image Anal..
[6] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[7] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.
[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[10] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[11] Xuegong Zhou,et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[12] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Andrew C. Ling,et al. An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.
[14] Zhi Liu,et al. 3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..
[15] Yijie Wang,et al. High Performance Implementation of 3D Convolutional Neural Networks on a GPU , 2017, Comput. Intell. Neurosci..
[16] Viktor Prasanna,et al. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System , 2017, FPGA.
[17] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[18] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.
[19] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[20] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[22] Shmuel Winograd,et al. On Multiplication of Polynomials Modulo a Polynomial , 1980, SIAM J. Comput..
[23] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[24] Jason Cong,et al. Bandwidth optimization through on-chip memory restructuring for HLS , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[25] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.