GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
暂无分享,去创建一个
Quoc V. Le | Jiquan Ngiam | Zhifeng Chen | Dehao Chen | Yanping Huang | Yonglong Cheng | HyoukJoong Lee | Yanping Huang | Z. Chen | Jiquan Ngiam | HyoukJoong Lee | Dehao Chen | Yanping Huang | Zhifeng Chen | Yonglong Cheng
[1] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[2] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[3] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[4] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Seung Woo Lee,et al. Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[6] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.
[7] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[8] Jonathan Krause,et al. Collecting a Large-scale Dataset of Fine-grained Cars , 2013 .
[9] Andreas Griewank,et al. Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation , 2000, TOMS.
[10] Dong Yu,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks , 2012, INTERSPEECH.
[11] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[12] Chia-Lin Yang,et al. Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform , 2018, ArXiv.
[13] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[14] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.
[15] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.
[16] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[17] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[18] Gérard Dreyfus,et al. Performance analysis of a pipelined backpropagation parallel algorithm , 1993, IEEE Trans. Neural Networks.
[19] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[20] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[21] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[23] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[24] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[25] Tsuyoshi Murata,et al. {m , 1934, ACML.
[26] Yuning Jiang,et al. MegDet: A Large Mini-Batch Object Detector , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Yang Song,et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[30] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[31] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[32] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[33] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.
[34] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.
[35] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[37] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[38] Yuxin Peng,et al. Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.
[39] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[40] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[41] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[44] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[45] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[46] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[47] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[48] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[49] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[50] Xiu-Shen Wei,et al. Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization , 2018, Pattern Recognit..
[51] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[52] Quoc V. Le,et al. Domain Adaptive Transfer Learning , 2018 .
[53] Tengyu Ma,et al. Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.
[54] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[55] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[56] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[57] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[59] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[60] Seunghak Lee,et al. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning , 2014, NIPS.
[61] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[62] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[63] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.
[64] Kunle Olukotun,et al. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark , 2018, ACM SIGOPS Oper. Syst. Rev..
[65] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[66] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[67] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[68] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[69] Jonathan Krause,et al. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.
[70] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[71] Dahua Lin,et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[73] Shuicheng Yan,et al. Dual Path Networks , 2017, NIPS.
[74] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[75] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.