APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores
暂无分享,去创建一个
Boyuan Feng | Yufei Ding | Tong Geng | Ang Li | Yuke Wang | Yuke Wang | Tong Geng | Ang Li | Boyuan Feng | Yufei Ding
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Maciej Urbanski,et al. Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product , 2020, 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH).
[3] Eunhyeok Park,et al. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[4] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Yanzhi Wang,et al. Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization , 2020, IJCAI.
[6] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Henk Corporaal,et al. X: A Comprehensive Analytic Model for Parallel Machines , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Martin C. Herbordt,et al. O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning , 2019, ICS.
[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[11] Lei Deng,et al. Boosting Deep Neural Network Efficiency with Dual-Module Inference , 2020, ICML.
[12] Olivier Giroux,et al. Volta: Performance and Programmability , 2018, IEEE Micro.
[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[14] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[15] Walter Stechele,et al. BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[16] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[17] H. T. Kung,et al. Embedded Binarized Neural Networks , 2017, EWSN.
[18] Marco Maggioni,et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking , 2019, ArXiv.
[19] Dacheng Tao,et al. Searching for Low-Bit Weights in Quantized Neural Networks , 2020, NeurIPS.
[20] Zhijian Liu,et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Wei Wu,et al. O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference , 2021, IEEE Transactions on Parallel and Distributed Systems.
[22] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[25] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[26] Saibal Mukhopadhyay,et al. Efficient Object Detection Using Embedded Binarized Neural Networks , 2018, J. Signal Process. Syst..
[27] Zhiwei Xiong,et al. Tracking by Instance Detection: A Meta-Learning Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Ang Li,et al. Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs , 2021, IEEE Transactions on Parallel and Distributed Systems.
[29] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Shiyue Zhang,et al. ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization , 2020, EMNLP.
[32] Tor M. Aamodt,et al. Modeling Deep Learning Accelerator Enabled GPUs , 2018, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[33] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[34] Boyuan Feng,et al. DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions , 2021, ArXiv.
[35] Luca Benini,et al. XpulpNN: Accelerating Quantized Neural Networks on RISC-V Processors Through ISA Extensions , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[36] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[37] Ronny Krashinsky,et al. NVIDIA A100 Tensor Core GPU: Performance and Innovation , 2021, IEEE Micro.
[38] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[39] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Wei Niu,et al. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.
[41] Henk Corporaal,et al. Critical points based register-concurrency autotuning for GPUs , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[42] Mingkui Tan,et al. Training Quantized Neural Networks With a Full-Precision Auxiliary Module , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Henk Corporaal,et al. Transit: A Visual Analytical Model for Multithreaded Machines , 2015, HPDC.
[44] Jieping Ye,et al. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.
[45] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[46] Martin C. Herbordt,et al. BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets , 2019, SC.