暂无分享,去创建一个
Gennady Pekhimenko | Geoffrey X. Yu | Pavel Golikov | Yubo Gao | Gennady Pekhimenko | P. Golikov | Yubo Gao
[1] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[2] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[3] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[4] P. Glaskowsky. NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .
[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[6] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Michael Carbin,et al. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks , 2018, ICML.
[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[12] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[13] Ameet Talwalkar,et al. Paleo: A Performance Model for Deep Neural Networks , 2016, ICLR.
[14] Lidong Zhou,et al. Astra: Exploiting Predictability to Optimize Deep Learning , 2019, ASPLOS.
[15] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[16] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[17] Amar Phanishayee,et al. Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training , 2020, USENIX Annual Technical Conference.
[18] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[19] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[20] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[21] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] A. Stephen McGough,et al. Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).
[24] Krish Shankar,et al. Azure Machine Learning , 2019 .
[25] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[27] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[28] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[29] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[30] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[31] Guo Wei,et al. Iteration Time Prediction for CNN in Multi-GPU Platform: Modeling and Analysis , 2019, IEEE Access.
[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[33] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[34] Ioannis Mitliagkas,et al. Asynchrony begets momentum, with an application to deep learning , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[37] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[38] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[39] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Amar Phanishayee,et al. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads , 2020, OSDI.
[41] Byung-Gon Chun,et al. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks , 2018, EuroSys.
[42] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[43] Antonio J. Peña,et al. Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs , 2019, IEEE Access.
[44] G. Kane. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .
[45] Sabri Pllana,et al. Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures , 2019, 2019 International Conference on High Performance Computing & Simulation (HPCS).
[46] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..