Performance Evaluation of HPGMG on Tianhe-2: Early Experience

In this paper, we evaluate and analyze the performance of HPGMG on the world’s largest supercomputer, Tianhe-2. We design and implement a general testing framework according to the performance-related parameters in HPGMG-FV and the architecture characteristics of Tianhe-2. This framework can automatically construct testing spaces, filter them by constrains, modify them by actual running results, and extract useful information from output files. By using this framework, we evaluate the performance of HPGMG at small-scale with different tunable parameters, and at large-scale of 8192 nodes with an overall performance of \(5.511\mathrm {e+}11\) DOF/s.

[1]  Pradeep Dubey,et al.  3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Samuel Williams,et al.  Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Werner Augustin,et al.  Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems , 2009, Euro-Par.

[4]  Sandia Report,et al.  Toward a New Metric for Ranking High Performance Computing Systems , 2013 .

[5]  Chao Yang,et al.  Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[6]  John Shalf,et al.  HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems , 2014 .

[7]  Gerhard Wellein,et al.  Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[8]  Volker Strumpen,et al.  Cache oblivious stencil computations , 2005, ICS '05.

[9]  Chao Yang,et al.  Optimizing and Scaling HPCG on Tianhe-2: Early Experience , 2014, ICA3PP.

[10]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Uday Bondhugula,et al.  Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Richard Veras,et al.  A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.