Extreme-scaling applications en route to exascale

Feedback from the previous year's very successful workshop motivated the organisation of a three-day workshop from 1 to 3 February 2016, during which the 28-rack JUQUEEN BlueGene/Q system with 458 752 cores was reserved for over 50 hours. Eight international code teams were selected to use this opportunity to investigate and improve their application scalability, assisted by staff from JSC Simulation Laboratories and Cross-Sectional Teams. Ultimately seven teams had codes successfully run on the full JUQUEEN system. Strong scalability demonstrated by Code_Saturne and Seven-League Hydro, both using 4 OpenMP threads for 16 MPI processes on each compute node for a total of 1 835 008 threads, qualify them for High-Q Club membership. Existing members CIAO and iFETI were able to show that they had additional solvers which also scaled acceptably. Furthermore, large-scale in-situ interactive visualisation was demonstrated with a CIAO simulation using 458 752 MPI processes running on 28 racks coupled via JUSITU to VisIt. The two adaptive mesh refinement utilities, ICI and p4est, showed that they could respectively scale to run with 458 752 and 971 504 MPI ranks, but both encountered problems loading large meshes. Parallel file I/O issues also hindered large-scale executions of PFLOTRAN. Poor performance of a NEST-import module which loaded and connected 1.9 TiB of neuron and synapse data was tracked down to an internal data-structure mismatch with the HDF5 file objects that prevented use of MPI collective file reading, which when rectified is expected to enable large-scale neuronal network simulations. Comparative analysis is provided to the 25 codes in the High-Q Club at the start of 2016, which includes five codes that qualified from the previous workshop. Despite more mixed results, we learnt more about application file I/O limitations and inefficiencies which continue to be the primary inhibitor to large-scale simulations.

[1]  Thomas Lippert,et al.  Innovatives Supercomputing in Deutschland , 2011 .

[2]  Hank Childs,et al.  VisIt: An End-User Tool for Visualizing and Analyzing Very Large Data , 2011 .

[3]  Jutta Docter,et al.  JUQUEEN: IBM Blue Gene/Q® Supercomputer System at the Jülich Supercomputing Centre , 2015 .

[4]  Wolfgang Frings,et al.  Efficient task-local I/O operations of massively parallel applications , 2016 .

[5]  Axel Klawonn,et al.  Toward Extremely Scalable Nonlinear Domain Decomposition Methods for Elliptic Partial Differential Equations , 2015, SIAM J. Sci. Comput..

[6]  Brian J. N. Wylie,et al.  JUQUEEN Extreme Scaling Workshop 2016 , 2015 .

[7]  W. Frings,et al.  Julich Blue Gene/P Extreme Scaling Workshop 2009 , 2009 .

[8]  Brian J. N. Wylie,et al.  aXXLs: Application Extreme-scaling Experience of Leading Supercomputing Centres (Workshop Introduction) , 2015 .

[9]  Brian J. N. Wylie,et al.  Extreme-Scale In Situ Visualization of Turbulent Flows on IBM Blue Gene/Q JUQUEEN , 2016, ISC Workshops.

[10]  P. V. F. Edelmann,et al.  New numerical solver for flows at various Mach numbers , 2014, 1409.8289.

[11]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[12]  BursteddeCarsten,et al.  p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees , 2011 .

[13]  Gernot Münster,et al.  NIC Symposium 2008 , 2008 .

[14]  G E Hammond,et al.  Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN , 2014, Water resources research.

[15]  F. Archambeau,et al.  Code Saturne: A Finite Volume Code for the computation of turbulent incompressible flows - Industrial Applications , 2004 .

[16]  S. Dosanjh,et al.  Architectures and Technology for Extreme Scale Computing Report from the Workshop Node Architecture and Power Reduction Strategies , 2011 .

[17]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[18]  Felix Wolf,et al.  Scalable massively parallel I/O to task-local files , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[19]  Marc-Oliver Gewaltig,et al.  NEST (NEural Simulation Tool) , 2007, Scholarpedia.

[20]  Brian J. N. Wylie,et al.  MAXI - Multi-System Application Extreme-Scaling Imperative , 2015, PARCO.

[21]  Heinz Pitsch,et al.  High-Fidelity Multiphase Simulations and In-Situ Visualization Using CIAO , 2016 .

[22]  Robert Latham,et al.  Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[23]  Carsten Burstedde,et al.  p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees , 2011, SIAM J. Sci. Comput..

[24]  Brian J. N. Wylie,et al.  Extreme-scaling Applications 24/7 on JUQUEEN Blue Gene/Q , 2015, PARCO.