HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems
暂无分享,去创建一个
Wanling Gao | Chuanxin Lan | Zihan Jiang | Jianfeng Zhan | Xingwang Xiong | Fei Tang | Lei Wang | Hongxiao Li | Chunjie Luo | Zihan Jiang | Wanling Gao | Lei Wang | Xingwang Xiong | Chunjie Luo | Jianfeng Zhan | Fei Tang | Hongxiao Li | Chuanxin Lan
[1] Ioannis Mitliagkas,et al. Deep Learning at 15PF : Supervised and Semi-Supervised Classification for Scientific Data , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.
[3] Olivier Richard,et al. CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE , 2018 .
[4] C. Drummond. Replicability is not Reproducibility:Nor is it Good Science , 2009 .
[5] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[8] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[9] Scenario-distilling AI Benchmarking , 2020 .
[10] Jidong Zhai,et al. AIPerf: Automated machine learning as an AI-HPC benchmark , 2020, Big Data Min. Anal..
[11] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.
[12] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[13] Wanling Gao,et al. Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices , 2020, ArXiv.
[14] Fabio Maria Carlucci,et al. NAS evaluation is frustratingly hard , 2020, ICLR.
[15] Yuchen Zhang,et al. HPC AI500: A Benchmark Suite for HPC AI Systems , 2018, Bench.
[16] Pongsakorn U.-Chupala,et al. ImageNet/ResNet-50 Training in 224 Seconds , 2018, ArXiv.
[17] Qingquan Song,et al. Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.
[18] Gu-Yeon Wei,et al. Fathom: reference workloads for modern deep learning methods , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).
[19] Koji Ueno,et al. Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.
[20] Prabhat,et al. Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer , 2017, ArXiv.
[21] Kai Hwang,et al. Edge AIBench: Towards Comprehensive End-to-end Edge Computing Benchmarking , 2018, Bench.
[22] Fan Zhang,et al. AIoT Bench: Towards Comprehensive Benchmarking Mobile and Embedded Device Intelligence , 2018, Bench.
[23] Torsten Hoefler,et al. A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[24] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[25] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[26] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Jack J. Dongarra,et al. The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..
[28] Lei Wang,et al. AIBench Training: Balanced Industry-Standard AI Training Benchmarking , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[29] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Wanling Gao,et al. HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems , 2020, ArXiv.
[31] Sameer Kumar,et al. PowerAI DDL , 2017, ArXiv.
[32] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[33] Prabhat,et al. CosmoFlow: Using Deep Learning to Learn the Universe at Scale , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] David Patterson,et al. MLPerf Training Benchmark , 2019, MLSys.
[35] Wanling Gao,et al. AIBench Scenario: Scenario-Distilling AI Benchmarking , 2020, 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[36] E Weinan,et al. Pushing the Limit of Molecular Dynamics with Ab Initio Accuracy to 100 Million Atoms with Machine Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[38] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[41] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[42] Prabhat,et al. ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events , 2016, NIPS.
[43] Amar Phanishayee,et al. Benchmarking and Analyzing Deep Neural Network Training , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).
[44] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[45] Vikram A. Saletore,et al. Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train , 2017, ArXiv.
[46] Pradeep Dubey,et al. On Scale-out Deep Learning Training for Cloud and HPC , 2018, ArXiv.
[47] Seiichi Ozawa,et al. t-Distributed stochastic neighbor embedding spectral clustering , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[48] Prabhat,et al. Exascale Deep Learning for Climate Analytics , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[49] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[50] Mikko H. Lipasti,et al. BenchNN: On the broad potential application scope of hardware neural network accelerators , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[51] Prabhat,et al. Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets , 2016, ArXiv.
[52] Jim Gray. Database and Transaction Processing Performance Handbook , 1993, The Benchmark Handbook.
[53] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[54] Laura Humphrey,et al. Evaluating Parallel Extensions to High Level Languages Using the HPC Challenge Benchmarks , 2009, 2009 DoD High Performance Computing Modernization Program Users Group Conference.