Towards efficient deep neural network training by FPGA-based batch-level parallelism
暂无分享,去创建一个
Wayne Luk | Hongxiang Fan | Cheng Luo | Ce Guo | Shuanglong Liu | Man-Kit Sit | W. Luk | Hongxiang Fan | Cheng Luo | Ce Guo | Man-Kit Sit | Shuanglong Liu
[1] Wayne Luk,et al. Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[4] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[5] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[6] Oliver Pell,et al. Maximum Performance Computing with Dataflow Engines , 2012, Computing in Science & Engineering.
[7] Guangwen Yang,et al. F-CNN: An FPGA-based framework for training Convolutional Neural Networks , 2016, 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[8] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[9] Soheil Ghiasi,et al. Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).
[10] Eriko Nurvitadhi,et al. A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform: A Deep Learning Case Study , 2018, FPGA.
[11] Stefan Wermter,et al. Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.
[12] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[13] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[14] Steven J. E. Wilton,et al. Simultaneous Inference and Training Using On-FPGA Weight Perturbation Techniques , 2018, 2018 International Conference on Field-Programmable Technology (FPT).
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Kunle Olukotun,et al. High-Accuracy Low-Precision Training , 2018, ArXiv.
[17] Chen Yang,et al. FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Patrice Y. Simard,et al. Using GPUs for machine learning algorithms , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).
[20] Ardavan Pedram,et al. CATERPILLAR: Coarse Grain Reconfigurable Architecture for accelerating the training of Deep Neural Networks , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).