Flexibility: FPGAs and CAD in Deep Learning Acceleration
暂无分享,去创建一个
Mohamed S. Abdelfattah | Andrew C. Ling | Davor Capalija | Gordon R. Chiu | Andrew Bitar | M. Abdelfattah | A. Ling | D. Capalija | Andrew Bitar
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Vaughn Betz,et al. Bringing programmability to the data plane: Packet processing with a NoC-enhanced FPGA , 2015, 2015 International Conference on Field Programmable Technology (FPT).
[3] Tarek S. Abdelrahman,et al. Tile-based bottom-up compilation of custom mesh-of-functional-units FPGA overlays , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).
[4] Andrew C. Ling,et al. An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .
[5] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[6] Vaughn Betz,et al. Take the Highway: Design for Embedded NoCs on FPGAs , 2015, FPGA.
[7] Jinglei Huang,et al. An Integrated Optimization Framework for Partitioning, Scheduling and Floorplanning on Partially Dynamically Reconfigurable FPGAs , 2017, ACM Great Lakes Symposium on VLSI.
[8] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[9] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[10] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[11] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[12] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.
[13] Tarek S. Abdelrahman,et al. A high-performance overlay architecture for pipelined execution of data flow graphs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Andrew C. Ling,et al. An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.
[16] Soonhoi Ha,et al. Handbook of Hardware/Software Codesign , 2017, Handbook of Hardware/Software Codesign.
[17] Donatella Sciuto,et al. Optimization strategies in design space exploration , 2017 .
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).