暂无分享,去创建一个
Achim Streit | Charlotte Debus | Martin Siggel | Daniel Coquelin | Markus Goetz | Fabrice von der Lehr | James Kahn | A. Streit | M. Siggel | D. Coquelin | Markus Goetz | Fabrice von der Lehr | Charlotte Debus | James Kahn
[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[2] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.
[3] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[4] Kunle Olukotun,et al. Understanding and optimizing asynchronous low-precision stochastic gradient descent , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[5] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[6] Karan Sapra,et al. Hierarchical Multi-Scale Attention for Semantic Segmentation , 2020, ArXiv.
[7] Andy B. Yoo,et al. Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .
[8] P. Pfeifer,et al. A Brief Primer on Probability Distributions , 2008, SSRN Electronic Journal.
[9] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[10] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Marcin Junczys-Dowmunt,et al. Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation , 2018, EMNLP.
[13] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Rong Zheng,et al. Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[16] Xu Liu,et al. Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect , 2019, IEEE Transactions on Parallel and Distributed Systems.
[17] HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics , 2020, ArXiv.
[18] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.
[19] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[20] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[21] Dorian Krause,et al. JUWELS: Modular Tier-0/1 Supercomputer at Jülich Supercomputing Centre , 2019, Journal of large-scale research facilities JLSRF.
[22] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Rio Yokota,et al. Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs , 2019, 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[24] Yang Wang,et al. Region Mutual Information Loss for Semantic Segmentation , 2019, NeurIPS.
[25] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[26] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[27] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[28] Hiroaki Mikami,et al. Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash , 2018 .