CFD for Next Generation Hardware: Experiences with Proxy Applications.

Production-level computational fluid dynamics application codes need significant changes to run efficiently on the next generation of computing systems employing many core processors and accelerators. In this work, we evaluate our production turbulent compressible flow application for suitability in a MPI+X parallel programming model, identify potential bottlenecks, and perform a preliminary evaluation of a surrogate proxy application, MiniAero. The proxy application is evaluated on four platforms without code modifications, comparing MPI+X to standard MPI only, where applicable. In these initial results using MiniAero, MPI only exhibits the best performance on the Intel Xeon architecture, but for Blue Gene/Q, the Nvidia K20X, and the Intel Xeon Phi, MPI+X shows significantly better performance than MPI only.

[1]  Christopher J. Roy,et al.  Directive-Based GPU Programming for Computational Fluid Dynamics , 2014 .

[2]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[3]  Harold C. Edwards,et al.  ASC-ATDM Performance Portability Requirements for 2015-2019 , 2015 .

[4]  Richard D. Hornung,et al.  The RAJA Portability Layer: Overview and Status , 2014 .

[5]  Dimitri J. Mavriplis,et al.  Unstructured-Mesh Discretizations and Solvers for Computational Aerodynamics , 2007 .

[6]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  William Gropp,et al.  CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences , 2014 .

[8]  C. T. Vaughan,et al.  Assessing the role of mini-applications in predicting key performance characteristics of scientific and engineering applications , 2015, J. Parallel Distributed Comput..

[9]  Steven G. Parker,et al.  A component-based parallel infrastructure for the simulation of fluid–structure interaction , 2006, Engineering with Computers.

[10]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[11]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[12]  M. Carpenter,et al.  Fourth-order 2N-storage Runge-Kutta schemes , 1994 .

[13]  Sivasankaran Rajamanickam,et al.  Towards Extreme-Scale Simulations for Low Mach Fluids with Second-Generation Trilinos , 2014, Parallel Process. Lett..