Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs
暂无分享,去创建一个
[1] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[2] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[3] Aviral Shrivastava,et al. dMazeRunner , 2019, ACM Trans. Embed. Comput. Syst..
[4] Wayne Luk,et al. Stream Processing Dual-Track CGRA for Object Inference , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[5] Jürgen Teich,et al. Efficient Mapping of CNNs onto Tightly Coupled Processor Arrays , 2019, J. Comput..
[6] Manoj Alwani,et al. Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Florian Schmidt,et al. BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism , 2018, ArXiv.
[8] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[9] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[10] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[13] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[14] Wolfgang J. Paul,et al. Computer architecture - complexity and correctness , 2000 .
[15] Nikil D. Dutt,et al. Small Memory Footprint Neural Network Accelerators , 2019, 20th International Symposium on Quality Electronic Design (ISQED).
[16] Christoforos E. Kozyrakis,et al. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators , 2019, ASPLOS.