Accelerating HotSpots in Deep Neural Networks on a CAPI-Based FPGA
暂无分享,去创建一个
[1] Kai Yu,et al. Large-scale deep learning at Baidu , 2013, CIKM.
[2] Kenli Li,et al. A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment , 2017, IEEE Transactions on Parallel and Distributed Systems.
[3] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[4] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[5] Asit K. Mishra,et al. From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Damien Lyonnard,et al. Parallel programming models for a multiprocessor SoC platform applied to networking and multimedia , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[7] Weisong Shi,et al. Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.
[8] Junzhong Shen,et al. FPGA‐accelerated deep convolutional neural networks for high throughput and energy efficiency , 2017, Concurr. Comput. Pract. Exp..
[9] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[10] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] Marc'Aurelio Ranzato,et al. Multi-GPU Training of ConvNets , 2013, ICLR.
[12] Heiner Giefers,et al. Accelerating arithmetic kernels with coherent attached FPGA coprocessors , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[13] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[14] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[15] Yangqing Jia,et al. Learning Semantic Image Representations at a Large Scale , 2014 .
[16] Eriko Nurvitadhi,et al. Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.
[17] Srihari Cadambi,et al. A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.
[18] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.
[19] Eriko Nurvitadhi,et al. Accelerating recurrent neural networks in analytics servers: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[20] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.
[21] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[22] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[23] Berin Martini,et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[24] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Enrique S. Quintana-Ortí,et al. Accelerating the Lyapack library using GPUs , 2013, The Journal of Supercomputing.
[26] Moriyoshi Ohara,et al. A power-efficient FPGA accelerator: Systolic array with cache-coherent interface for pair-HMM algorithm , 2016, 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX).
[27] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[28] Zelong Wang,et al. Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA , 2018, FPGA.