GACT: Activation Compressed Training for General Architectures
暂无分享,去创建一个
Michael W. Mahoney | Lianmin Zheng | Dequan Wang | M. Mahoney | Xu Han | Jianfei Chen | Yukuo Cen | Jie Tang | Xiaoxuan Liu | Weize Chen | Zhiyuan Liu | Joey Gonzalez | Alvin Cheung
[1] Markus N. Rabe,et al. Self-attention Does Not Need $O(n^2)$ Memory , 2021, 2112.05682.
[2] Bohan Zhuang,et al. Mesa: A Memory-saving Training Framework for Transformers , 2021, ArXiv.
[3] Ion Stoica,et al. ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training , 2021, ICML.
[4] Olatunji Ruwase,et al. ZeRO-Offload: Democratizing Billion-Scale Model Training , 2021, USENIX ATC.
[5] Noam M. Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, J. Mach. Learn. Res..
[6] Guanpeng Li,et al. A novel memory-efficient deep learning training framework via error-bounded lossy compression , 2020, PPoPP.
[7] Joseph E. Gonzalez,et al. A Statistical Framework for Low-bitwidth Training of Deep Neural Networks , 2020, NeurIPS.
[8] Jiawei Jiang,et al. Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript , 2020, ICML.
[9] Yaliang Li,et al. Simple and Deep Graph Convolutional Networks , 2020, ICML.
[10] Tianqi Chen,et al. Dynamic Tensor Rematerialization , 2020, International Conference on Learning Representations.
[11] J. Leskovec,et al. Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.
[12] Tor M. Aamodt,et al. JPEG-ACT: Accelerating Deep Learning via Transform-based Lossy Compression , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[13] Gu Jin,et al. SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping , 2020, ASPLOS.
[14] Hai Jin,et al. Capuchin: Tensor-based GPU Memory Management for Deep Learning , 2020, ASPLOS.
[15] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[16] P. Abbeel,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.
[17] Rajgopal Kannan,et al. GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.
[18] Benjamin Moseley,et al. Backprop with Approximate Activations for Memory-efficient Network Training , 2019, NeurIPS.
[19] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[20] Elad Hoffer,et al. Scalable Methods for 8-bit Training of Neural Networks , 2018, NeurIPS.
[21] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[22] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[23] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[24] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[25] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[29] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[30] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[33] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[36] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Fan Yang,et al. EXACT: Scalable Graph Neural Networks Training via Extreme Activation Compression , 2022, ICLR.
[38] Tor M. Aamodt,et al. AC-GC: Lossy Activation Compression with Guaranteed Convergence , 2021, NeurIPS.
[39] Chang Zhou,et al. CogDL: A Toolkit for Deep Learning on Graphs , 2021 .
[40] Olivier Beaumont,et al. Efficient Combination of Rematerialization and Offloading for Training DNNs , 2021, NeurIPS.
[41] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Swagath Venkataramani,et al. Ultra-Low Precision 4-bit Training of Deep Neural Networks , 2020, NeurIPS.
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[44] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.