Efficient Tensor Core-Based GPU Kernels for Structured Sparsity under Reduced Precision
暂无分享,去创建一个
Yufei Ding | Zheng Qu | Zhaodong Chen | Yuan Xie | Liu Liu | Liu Liu | Yufei Ding | Zheng Qu | Yuan Xie | Zhaodong Chen
[1] Lei Deng,et al. Boosting Deep Neural Network Efficiency with Dual-Module Inference , 2020, ICML.
[2] Torsten Hoefler,et al. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication , 2019, SC.
[3] Yiran Chen,et al. Running sparse and low-precision neural network: When algorithm meets hardware , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).
[4] Yuan Xie,et al. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.
[5] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[6] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[7] Erich Elsen,et al. Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.
[8] Song Han,et al. Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.
[9] Tor M. Aamodt,et al. Modeling Deep Learning Accelerator Enabled GPUs , 2018, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[10] Rajesh K. Gupta,et al. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[11] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[12] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[13] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[14] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[15] Tianqi Wang,et al. UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing , 2019, ArXiv.
[16] Xiaowen Chu,et al. Optimizing batched winograd convolution on GPUs , 2020, PPoPP.
[17] Eriko Nurvitadhi,et al. Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[19] Amanda Amy Harris Houk,et al. Google for Research , 2012 .
[20] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[21] Xiaofan Xu,et al. SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks , 2018, ArXiv.
[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[23] Lei Deng,et al. fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[24] Jinjun Xiong,et al. Accelerating reduction and scan using tensor core units , 2018, ICS.
[25] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.