HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems∗

We describe a new high performance conjugate gradient (HPCG) benchmark. HPCG is composed of computations and data access patterns commonly found in scientific applications. HPCG strives for a better correlation to existing codes from the computational science domain and be representative of their performance. HPCG ismeant to help drive the computer system design and implementation in directions that will better impact future performance improvement.

[1]  G. M.,et al.  Partial Differential Equations I , 2023, Applied Mathematical Sciences.

[2]  Gérard Meurant Multitasking the conjugate gradient method on the CRAY X-MP/48 , 1987, Parallel Comput..

[3]  R. Mattheij,et al.  Partial Differential Equations: Modeling, Analysis, Computation (Siam Monographs on Mathematical Modeling and Computation) (Saim Models on Mathematical Modeling and Computation) , 2005 .

[4]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[5]  V. Eijkhout Qualitative Properties of the Conjugate Gradient and Lanczos Methods in a Matrix Framework , 1992 .

[6]  Victor Eijkhout,et al.  LAPACK Working Note 56: Reducing Communication Costs in the Conjugate Gradient Algorithm on Distributed Memory Multiprocessors , 1993 .

[7]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[8]  Victor Eijkhout,et al.  An iterative solver benchmark , 2001, Sci. Program..

[9]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[10]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[11]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[12]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[13]  Torsten Hoefler,et al.  Optimizing a conjugate gradient solver with non-blocking collective operations , 2006, Parallel Comput..

[14]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[15]  J. Dongarra,et al.  Analysis of various scalar , vector , and parallel implementations of RandomAccess ∗ , 2010 .

[16]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[17]  Katherine Yelick,et al.  Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .

[18]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[19]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[20]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[21]  David H. Bailey,et al.  The NAS Parallel Benchmarks 2.0 , 2015 .

[22]  R. V. D. Wijngaart NAS Parallel Benchmarks Version 2.4 , 2022 .