A Gradient-Interleaved Scheduler for Energy-Efficient Backpropagation for Training Neural Networks
暂无分享,去创建一个
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Jiajun Li,et al. TNPU: an efficient accelerator architecture for training convolutional neural networks , 2019, ASP-DAC.
[4] Chi-Ying Tsui,et al. SparseNN: An energy-efficient neural network accelerator exploiting input and output sparsity , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[5] Jason Cong,et al. HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing , 2019, FPGA.
[6] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.
[7] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.
[8] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[9] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[10] Yimin Zhuang,et al. Deep Fusion: A Software Scheduling Method for Memory Access Optimization , 2019, NPC.
[11] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[12] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Keshab K. Parhi,et al. High-level DSP synthesis using concurrent transformations, scheduling, and allocation , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[14] H. T. Kung,et al. Systolic Arrays for (VLSI). , 1978 .
[15] Keshab K. Parhi,et al. VLSI digital signal processing systems , 1999 .
[16] Yimin Zhuang,et al. Partition and Scheduling Algorithms for Neural Network Accelerators , 2019, APPT.
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Keshab K. Parhi. Hierarchical Folding and Synthesis of Iterative Data Flow Graphs , 2013, IEEE Transactions on Circuits and Systems II: Express Briefs.
[20] Chunhua Deng,et al. PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Jason Cong,et al. Latte: Locality Aware Transformation for High-Level Synthesis , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[22] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[23] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[24] Thierry Moreau,et al. A Hardware–Software Blueprint for Flexible Deep Learning Specialization , 2018, IEEE Micro.
[25] Keshab K. Parhi,et al. Determining the minimum iteration period of an algorithm , 1995, J. VLSI Signal Process..
[26] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[27] Matthew Mattina,et al. SCALE-Sim: Systolic CNN Accelerator , 2018, ArXiv.
[28] Tao Li,et al. Eager Pruning: Algorithm and Architecture Support for Fast Training of Deep Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[29] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.