Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing
暂无分享,去创建一个
[1] Leibo Liu,et al. An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[2] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[3] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[4] Boris Murmann,et al. Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications , 2019, ICML.
[5] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[6] Cesare Alippi,et al. Moving Convolutional Neural Networks to Embedded Systems: The AlexNet and VGG-16 Case , 2018, 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[9] Tajana Simunic,et al. Efficient neural network acceleration on GPGPU using content addressable memory , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.
[10] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[11] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[12] Lee-Sup Kim,et al. A kernel decomposition architecture for binary-weight Convolutional Neural Networks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Robert Mullins,et al. 1D-FALCON: Accelerating Deep Convolutional Neural Network Inference by Co-optimization of Models and Underlying Arithmetic Implementation , 2017, ICANN.
[14] Radu Marculescu,et al. On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems , 2017, IEEE Transactions on Computers.
[15] Olivier Temam,et al. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[16] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[17] Mahmut T. Kandemir,et al. Opportunistic Computing in GPU Architectures , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[18] Tze Meng Low,et al. High Performance Zero-Memory Overhead Direct Convolutions , 2018, ICML.
[19] Scott A. Mahlke,et al. DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).