Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system
暂无分享,去创建一个
[1] Samuel Williams,et al. Auto-tuning performance on multicore computers , 2008 .
[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Samuel Williams,et al. An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability , 2018, 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[5] Eric L. Shirley,et al. Electron Self-Energy Calculation Using a General Multi-Pole Approximation , 2003 .
[6] Samuel Williams,et al. Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor , 2016, ISC Workshops.
[7] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[8] Samuel Williams,et al. Evaluating and Optimizing the NERSC Workload on Knights Landing , 2016, 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[9] Samuel Williams,et al. A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization , 2018, ISC.