暂无分享,去创建一个
[1] Yury Gorbachev,et al. OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[2] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[3] Erich Elsen,et al. The State of Sparsity in Deep Neural Networks , 2019, ArXiv.
[4] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Alexander Kozlov,et al. Neural Network Compression Framework for fast model inference , 2020, ArXiv.
[6] John D. Owens,et al. Design Principles for Sparse Matrix Multiplication on the GPU , 2018, Euro-Par.
[7] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[8] Yafeng Yang,et al. MNN: A Universal and Efficient Inference Engine , 2020, MLSys.
[9] Ziheng Wang,et al. SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference , 2020, PACT.
[10] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[11] Chen Zhang,et al. Balanced Sparsity for Efficient DNN Inference on GPU , 2018, AAAI.
[12] Max Welling,et al. Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.
[13] Yida Wang,et al. Optimizing CNN Model Inference on CPUs , 2018, USENIX Annual Technical Conference.
[14] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[15] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[16] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[17] David Cox,et al. Triton: an intermediate language and compiler for tiled neural network computations , 2019, MAPL@PLDI.
[18] P. Sadayappan,et al. Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.
[19] Amos Storkey,et al. A Closer Look at Structured Pruning for Neural Network Compression , 2018 .
[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[21] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[22] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[23] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[24] Alexander M. Rush,et al. Movement Pruning: Adaptive Sparsity by Fine-Tuning , 2020, NeurIPS.
[25] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[26] N. Santhanam,et al. Artificial-intelligence hardware: New opportunities for semiconductor companies , 2019 .
[27] Srinivasan Parthasarathy,et al. Efficient sparse-matrix multi-vector product on GPUs , 2018, HPDC.
[28] Erich Elsen,et al. Rigging the Lottery: Making All Tickets Winners , 2020, ICML.
[29] Dmitry P. Vetrov,et al. Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.
[30] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[31] Michael Anderson,et al. High-Performance Deep Learning via a Single Building Block , 2019, ArXiv.
[32] Gagan Agrawal,et al. A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs , 2020, PPoPP.
[33] Ziheng Wang,et al. Structured Pruning of Large Language Models , 2019, EMNLP.
[34] Pradeep Dubey,et al. Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.
[35] Xuhao Chen,et al. Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs , 2018, 1802.10280.
[36] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .