Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud
暂无分享,去创建一个
Yu Wang | Shulin Zeng | Hanbo Sun | Huazhong Yang | Guangjun Ge | Kaiyuan Guo | Kai Zhong | Guohao Dai
[1] Andrew C. Ling,et al. An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.
[2] Yu Wang,et al. DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[3] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[4] Andrew C. Ling,et al. An OpenCL(TM) Deep Learning Accelerator on Arria 10 , 2017 .
[5] Mohamed S. Abdelfattah,et al. DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[6] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[7] Dirk Koch,et al. A Survey on FPGA Virtualization , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).
[8] Yu Wang,et al. [DL] A Survey of FPGA-based Neural Network Inference Accelerators , 2019, ACM Trans. Reconfigurable Technol. Syst..
[9] Eugenio Culurciello,et al. Compiling Deep Learning Models for Custom Hardware Accelerators , 2017, ArXiv.
[10] Jason Cong,et al. A Fully Pipelined and Dynamically Composable Architecture of CGRA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[13] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[14] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[15] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[16] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[17] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Martin D. Schatz,et al. Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.
[20] Peipei Zhou. A Fully Pipelined and Dynamically Composable Architecture of CGRA (Coarse Grained Reconfigurable Architecture) , 2014, FCCM 2014.
[21] Yann LeCun,et al. Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.
[22] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[23] 王丛. AWS Re:Invent 2014将公有云竞争推向白热化 , 2014 .
[24] Yao Chen,et al. Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs , 2019, FPGA.
[25] Kushagra Vaid,et al. Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.
[26] John Michael Lowe,et al. Performance Characteristics of Virtualized GPUs for Deep Learning , 2020, 2020 IEEE/ACM International Workshop on Interoperability of Supercomputing and Cloud Technologies (SuperCompCloud).
[27] Guy Lemieux,et al. ZUMA: An Open FPGA Overlay Architecture , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[28] Yu Wang,et al. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[29] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Wayne Luk,et al. Deep Neural Network Approximation for Custom Hardware , 2019, ACM Comput. Surv..
[31] Yu Wang,et al. Online scheduling for FPGA computation in the Cloud , 2014, 2014 International Conference on Field-Programmable Technology (FPT).
[32] Scott Hauck,et al. Performance of partial reconfiguration in FPGA systems: A survey and a cost model , 2011, TRETS.
[33] Yu Zhang,et al. Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Eric S. Chung,et al. A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[36] Alexandru Uta,et al. A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability , 2018, ICPE Companion.