FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
暂无分享,去创建一个
Yang You | R. Wu | Xiwen Zhang | Zhongming Yu | Shenggan Cheng | Bin-Rui Li | Jian Peng
[1] Reza Yazdani Aminabadi,et al. DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale , 2022, ICML.
[2] James Lin,et al. ParaFold: Paralleling AlphaFold for Large-Scale Predictions , 2021, HPC Asia Workshops.
[3] Xian Qian,et al. LightSeq2: Accelerated Training for Transformer-Based Models on GPUs , 2021, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] J. Korbel,et al. AlphaDesign: A de novo protein design framework based on AlphaFold , 2021, bioRxiv.
[5] R. Laskowski,et al. AlphaFold heralds a data-driven revolution in biology and medicine , 2021, Nature Medicine.
[6] Oriol Vinyals,et al. Highly accurate protein structure prediction with AlphaFold , 2021, Nature.
[7] Gyu Rie Lee,et al. Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.
[8] Olatunji Ruwase,et al. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep learning , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Amar Phanishayee,et al. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] John F. Canny,et al. MSA Transformer , 2021, bioRxiv.
[12] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[13] Yang Yu,et al. TurboTransformers: an efficient GPU serving system for transformer models , 2020, PPoPP.
[14] Lucy J. Colwell,et al. Rethinking Attention with Performers , 2020, ICLR.
[15] Mingxuan Wang,et al. LightSeq: A High Performance Inference Library for Transformers , 2021, NAACL.
[16] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[17] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[18] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[19] David T. Jones,et al. Improved protein structure prediction using potentials from deep learning , 2020, Nature.
[20] Samyam Rajbhandari,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2019, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[21] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.
[22] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[23] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[24] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[25] Jinbo Xu. Distance-based protein folding powered by deep learning , 2018, Proceedings of the National Academy of Sciences.
[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[27] Matthew Johnson,et al. Compiling machine learning programs via high-level tracing , 2018 .
[28] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[31] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[32] C. Anfinsen. Principles that govern the folding of protein chains. , 1973, Science.
[33] B. Welford. Note on a Method for Calculating Corrected Sums of Squares and Products , 1962 .