暂无分享,去创建一个
Yosuke Oyama | Peter Nugent | Brian Van Essen | Peter Harrington | Naoya Maruyama | Nikoli Dryden | Satoshi Matsuoka | Jan Balewski | Erin McCarthy | Nikoli Dryden | S. Matsuoka | N. Maruyama | J. Balewski | Erin McCarthy | Peter Nugent | B. V. Van Essen | P. Harrington | Yosuke Oyama
[1] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[2] Sai Narasimhamurthy,et al. Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).
[3] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[4] Takuya Akiba,et al. PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track , 2018, ArXiv.
[5] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[7] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[8] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[9] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[10] Toshio Endo,et al. ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity , 2017, 2017 IEEE International Conference on Big Data (Big Data).
[11] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[12] Hao Chen,et al. The Liver Tumor Segmentation Benchmark (LiTS) , 2019, Medical Image Anal..
[13] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[14] Seyed-Ahmad Ahmadi,et al. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).
[15] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[17] Wu-chun Feng,et al. Towards Scalable Deep Learning via I/O Analysis and Optimization , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[18] Weikuan Yu,et al. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).
[19] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.
[20] Dong Yu,et al. Pipelined BackPropagation for Context-Dependent Deep Neural Networks , 2012 .
[21] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[22] Satoshi Matsuoka,et al. Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers , 2016, 2016 IEEE International Conference on Big Data (Big Data).
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Hyojin Kim,et al. LBANN: livermore big artificial neural network HPC toolkit , 2015, MLHPC@SC.
[25] Prabhat,et al. CosmoFlow: Using Deep Learning to Learn the Universe at Scale , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[27] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[28] Chen Meng,et al. Training Deeper Models by GPU Memory Optimization on TensorFlow , 2017 .
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[31] Dong Yu,et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks , 2012, INTERSPEECH.
[32] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[35] Santiago Rodriguez Papa,et al. Conduit , 2021, Proceedings of the Genetic and Evolutionary Computation Conference Companion.
[36] Anand Pratap Singh,et al. New Approaches in Turbulence and Transition Modeling Using Data-driven Techniques , 2015 .
[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[38] John Shalf,et al. Tuning HDF5 for Lustre File Systems , 2010 .
[39] Torsten Hoefler,et al. Accelerating Deep Learning Frameworks with Micro-Batches , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[40] Sam Ade Jacobs,et al. Parallelizing Training of Deep Generative Models on Massive Scientific Datasets , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).
[41] Kurt Keutzer,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.
[42] Barnabás Póczos,et al. Estimating Cosmological Parameters from the Dark Matter Distribution , 2016, ICML.
[43] Nam Sung Kim,et al. Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.
[44] Marc Snir,et al. Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[45] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[46] Sergio Gomez Colmenarejo,et al. TF-Replicator: Distributed Machine Learning for Researchers , 2019, ArXiv.
[47] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[48] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[49] Alexander Aiken,et al. Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.
[50] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[51] Weikuan Yu,et al. I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning , 2019, ICPP.
[52] Jianwei Li,et al. Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[53] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[54] Kurt Keutzer,et al. Integrated Model, Batch, and Domain Parallelism in Training Neural Networks , 2017, SPAA.
[55] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[56] Travis M. Drucker,et al. High Resolution Medical Image Analysis with Spatial Partitioning , 2019, ArXiv.
[57] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.
[58] M. Snir,et al. Channel and filter parallelism for large-scale CNN training , 2019, SC.
[59] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.