Performance optimization of convolution calculation by blocking and sparsity on GPU
暂无分享,去创建一个
[1] Yang Yi,et al. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs , 2016 .
[2] Onur Mutlu,et al. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[3] Xiaowei Li,et al. SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules , 2019, IEEE Transactions on Computers.
[4] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Shaohuai Shi,et al. Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units , 2017, ArXiv.
[7] Yuanjie Zheng,et al. Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model , 2017, Scientific Reports.
[8] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[9] Hassan Foroosh,et al. Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[11] Da Wang,et al. Optimizing Sparse Matrix Vector Multiplication Using Cache Blocking Method on Fermi GPU , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.
[12] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[13] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.
[14] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[15] Tao Shen,et al. DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.
[16] Jian Wang,et al. Cross-Modal Retrieval via Deep and Bidirectional Representation Learning , 2016, IEEE Transactions on Multimedia.
[17] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[18] Peter A. Beerel,et al. Accelerating Training of Deep Neural Networks via Sparse Edge Processing , 2017, ICANN.
[19] Zhiyong Liu,et al. High-performance blob-based iterative three-dimensional reconstruction in electron tomography using multi-GPUs , 2012, BMC Bioinformatics.
[20] Jason Cong,et al. Caffeine: Toward Uniformed Representation and Acceleration for Deep Convolutional Neural Networks , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[21] Yilong Yin,et al. Choroid segmentation from Optical Coherence Tomography with graph-edge weights learned from deep convolutional neural networks , 2017, Neurocomputing.
[22] Chen-Ting Chao,et al. Accelerate DNN Performance with Sparse Matrix Compression in Halide , 2019, ICPP Workshops.
[23] Lei Zheng,et al. Joint Deep Modeling of Users and Items Using Reviews for Recommendation , 2017, WSDM.
[24] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[25] Liancheng Jia,et al. A coordinated tiling and batching framework for efficient GEMM on GPUs , 2019, PPoPP.
[26] Ying Tan,et al. GPU-based parallel particle swarm optimization , 2009, 2009 IEEE Congress on Evolutionary Computation.
[27] Yangqing Jia,et al. Learning Semantic Image Representations at a Large Scale , 2014 .
[28] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[29] Michael O'Boyle,et al. Optimising Convolutional Neural Networks Inference on Low-Powered GPUs , 2019 .
[30] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[31] Dianjie Lu,et al. CSCC: Convolution Split Compression Calculation Algorithm for Deep Neural Network , 2019, IEEE Access.