Clairvoyant Prefetching for Distributed Machine Learning I/O
暂无分享,去创建一个
[1] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[2] John Shalf,et al. Tuning HDF5 for Lustre File Systems , 2010 .
[3] Guojing Cong,et al. Accelerating Data Loading in Deep Neural Network Training , 2019, 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC).
[4] Anna R. Karlin,et al. Near-Optimal Parallel Prefetching and Caching , 2000, SIAM J. Comput..
[5] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[6] Pavan Balaji,et al. Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.
[7] Sem C. Borst,et al. Distributed Caching Algorithms for Content Distribution Networks , 2010, 2010 Proceedings IEEE INFOCOM.
[8] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[10] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[11] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[12] Franklin Abodo,et al. Detecting Work Zones in SHRP 2 NDS Videos Using Deep Learning Based Computer Vision , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).
[13] Prabhat,et al. CosmoFlow: Using Deep Learning to Learn the Universe at Scale , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Weikuan Yu,et al. I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning , 2019, ICPP.
[15] Christoph Ambühl. Parallel prefetching and caching is NP-hard , 2003 .
[16] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[17] Kishor S. Trivedi,et al. A simple characterization of provably efficient prefetching algorithms , 2002, Proceedings International Conference on Dependable Systems and Networks.
[18] Anna R. Karlin,et al. A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.
[19] Marc Snir,et al. Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems , 2018, 2018 IEEE/ACM Machine Learning in HPC Environments (MLHPC).
[20] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Weikuan Yu,et al. Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems , 2018, 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).
[22] Susanne Albers,et al. Integrated prefetching and caching in single and parallel disk systems , 2003, SPAA '03.
[23] Sam Ade Jacobs,et al. Parallelizing Training of Deep Generative Models on Massive Scientific Datasets , 2019, 2019 IEEE International Conference on Cluster Computing (CLUSTER).
[24] Yosuke Oyama,et al. The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs With Hybrid Parallelism , 2020, IEEE Transactions on Parallel and Distributed Systems.
[25] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[26] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[27] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[28] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[29] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[31] Sai Narasimhamurthy,et al. Characterizing Deep-Learning I/O Workloads in TensorFlow , 2018, 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS).
[32] Susanne Albers,et al. Minimizing Stall Time in Single and Parallel Disk Systems Using Multicommodity Network Flows , 2001, RANDOM-APPROX.
[33] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[34] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[35] Christian Koch,et al. Category-aware hierarchical caching for video-on-demand content on youtube , 2018, MMSys.
[36] Susanne Albers,et al. Minimizing stall time in single and parallel disk systems , 1998, STOC '98.
[37] Jianwei Li,et al. Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[38] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[39] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.