论文信息 - A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity

A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity

When acquiring a supercomputer it is desirable to specify its performance using a single number. For many procurements, this is usually stated as a performance increase over a current generation platform, for example machine A provides 10 times greater performance than machine B. The determination of such a single number is not necessarily a simple process; there is no universal agreement on how this calculation is performed and each facility usually uses their own method. In the future, the landscape will be further complicated because systems will contain a heterogeneous mix of node types, and, by design, every application will not run on every node type. For example, at the National Energy Research Scientific Computing Center (NERSC) the Cori supercomputer contains two node types, nodes based on dual-socket Intel Xeon (Haswell) processors and nodes based on Intel Xeon Phi (Knights Landing) processors. However, NERSC evaluated these two partitions separately, without utilizing a single, combined performance metric. NERSC will be deploying its next-generation machine, NERSC-9, in the year 2020 and anticipates that it too will be a heterogeneous mix of node types. The purpose of this paper is to describe a single performance metric for a heterogeneous system.

[1] Samuel Williams,et al. Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers , 2019, Int. J. High Perform. Comput. Appl..

[2] James E. Smith,et al. Characterizing computer performance with a single number , 1988, CACM.

[3] L. H. Howell,et al. CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. I. HYDRODYNAMICS AND SELF-GRAVITY , 2010, 1005.0114.

[4] A. Burrows,et al. CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. III. MULTIGROUP RADIATION HYDRODYNAMICS , 2012, 1207.3845.

[5] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6] A. Burrows,et al. CASTRO: A NEW COMPRESSIBLE ASTROPHYSICAL SOLVER. II. GRAY RADIATION HYDRODYNAMICS , 2011, 1105.2466.

[7] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[8] Mahesh Rajan,et al. Application-Driven Acceptance of Cielo an XE6 Petascale Capability Platform. , 2011 .

[9] Philip J. Fleming,et al. How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[10] Torsten Hoefler,et al. Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[11] John Shalf,et al. The NERSC Sustained System Performance (SSP) Metric , 2005 .

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Steven Gottlieb,et al. Benchmarking MILC code with OpenMP and MPI , 2001 .