Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference
暂无分享,去创建一个
Yiyu Shi | Edwin H.-M. Sha | Qingfeng Zhuge | Jingtong Hu | Weiwen Jiang | Xinyi Zhang | Lei Yang | E. Sha | Yiyu Shi | J. Hu | Q. Zhuge | Xinyi Zhang | Lei Yang | Weiwen Jiang
[1] Yiyu Shi,et al. Resource constrained cellular neural networks for real-time obstacle detection using FPGAs , 2018, 2018 19th International Symposium on Quality Electronic Design (ISQED).
[2] Edwin Hsing-Mean Sha,et al. Heterogeneous FPGA-Based Cost-Optimal Design for Timing-Constrained CNNs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[3] Yu Cao,et al. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks , 2017, FPGA.
[4] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Tulika Mitra,et al. OPTiC: Optimizing Collaborative CPU–GPU Computing on Mobile Devices With Thermal Constraints , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[6] Jakob Engblom,et al. The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.
[7] Lei Yang,et al. Optimal Application Mapping and Scheduling for Network-on-Chips with Computation in STT-RAM Based Router , 2019, IEEE Transactions on Computers.
[8] WilhelmReinhard,et al. The worst-case execution-time problemoverview of methods and survey of tools , 2008 .
[9] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[10] Yi Wang,et al. Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture , 2019, IEEE Transactions on Parallel and Distributed Systems.
[11] Chen Yang,et al. FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[12] Junzhong Shen,et al. Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[13] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..
[14] Song Han,et al. Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).
[15] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Peng Chen,et al. Task mapping on SMART NoC: Contention matters, not the distance , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[17] Yi Wang,et al. Towards Cross-Platform Inference on Edge Devices with Emerging Neuromorphic Architecture , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[18] Nong Xiao,et al. An Efficient Mapping Approach to Large-Scale DNNs on Multi-FPGA Architectures , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[19] Jason Cong,et al. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.
[20] Christos-Savvas Bouganis,et al. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[21] Soheil Ghiasi,et al. Cappuccino: Efficient CNN Inference Software Synthesis for Mobile System-on-Chips , 2019, IEEE Embedded Systems Letters.
[22] Yi Wang,et al. Towards Memory-Efficient Allocation of CNNs on Processing-in-Memory Architecture , 2018, IEEE Transactions on Parallel and Distributed Systems.
[23] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[24] Yong Wang,et al. SDA: Software-defined accelerator for large-scale DNN systems , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).
[25] Lei Yang,et al. Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture Search , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).
[26] Edwin Hsing-Mean Sha,et al. FoToNoC: A Folded Torus-Like Network-on-Chip Based Many-Core Systems-on-Chip in the Dark Silicon Era , 2017, IEEE Transactions on Parallel and Distributed Systems.
[27] Hari Angepat,et al. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave , 2018, IEEE Micro.
[28] Yiyu Shi,et al. Hardware/Software Co-Exploration of Neural Architectures , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[29] Jason Cong,et al. Scaling for edge inference of deep neural networks , 2018 .
[30] Jinjun Xiong,et al. On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks , 2018, ICLR.
[31] Junzhong Shen,et al. Accelerating 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System , 2019, FPGA.
[32] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[33] Mert R. Sabuncu,et al. An Unsupervised Learning Model for Deformable Medical Image Registration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[35] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[36] Paramvir Bahl,et al. Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.
[37] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[38] Xiaobo Sharon Hu,et al. Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Michael Ferdman,et al. Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).