Simulation at Extreme-Scale: Co-Design Thinking and Practices

The current trend of computer architecture evolving towards exaflop/s is the fast increasing floating point performance (the so-called “free” flops) accompanied by much slowly improving the bandwidth of memory and network. Numerical simulation would undergo the challenge posed by the unbalanced increase in the compute power and the capability of data movement. In this paper, after reviewing the challenges of hardware and software in moving towards exascale computing, we present co-design thinking for selecting, optimizing, and developing a numerical algorithm and a simulation tool to meet the challenge of simulation at extreme scale. Examples are presented to demonstrate the new way of thinking and its effectiveness on the emerging architecture.

[1]  I. Babuska,et al.  The generalized finite element method , 2001 .

[2]  Xia Ma,et al.  Material point method enhanced by modified gradient of shape function , 2011, J. Comput. Phys..

[3]  Rong Tian,et al.  Meshfree/GFEM in hardware-efficiency prospective , 2013 .

[4]  Genki Yagawa,et al.  Allman's triangle, rotational DOF and partition of unity , 2007 .

[5]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[6]  Moncho Gómez-Gesteira,et al.  New multi-GPU implementation for smoothed particle hydrodynamics on heterogeneous clusters , 2013, Comput. Phys. Commun..

[7]  Moncho Gómez-Gesteira,et al.  Optimization strategies for CPU and GPU implementations of a smoothed particle hydrodynamics method , 2013, Comput. Phys. Commun..

[8]  I. Babuska,et al.  The Partition of Unity Method , 1997 .

[9]  Michael S. Warren,et al.  A portable parallel particle program , 1995 .

[10]  I. Babuska,et al.  Special finite element methods for a class of second order elliptic problems with rough coefficients , 1994 .

[11]  Ivo Babuška,et al.  The generalized finite element method for Helmholtz equation. Part II: Effect of choice of handbook functions, error due to absorbing boundary conditions and its assessment , 2008 .

[12]  I. Babuska,et al.  Generalized finite element method using mesh-based handbooks: application to problems in domains with many voids , 2003 .

[13]  Cahal McVeigh,et al.  Linking microstructure and properties through a predictive multiresolution continuum , 2008 .

[14]  James Guilkey,et al.  An evaluation of explicit time integration schemes for use with the generalized interpolation material point method , 2008, J. Comput. Phys..

[15]  Matthias Teschner,et al.  A Parallel SPH Implementation on Multi‐Core CPUs , 2011, Comput. Graph. Forum.

[16]  T. Liszka,et al.  A generalized finite element method for the simulation of three-dimensional dynamic crack propagation , 2001 .

[17]  Xiong Zhang,et al.  An explicit material point finite element method for hyper‐velocity impact , 2006 .

[18]  T. Strouboulis,et al.  The generalized finite element method: an example of its implementation and illustration of its performance , 2000 .

[19]  Robert Strzodka,et al.  Mixed Precision Methods for Convergent Iterative Schemes , 2006 .

[20]  Cleve B. Moler,et al.  Iterative Refinement in Floating Point , 1967, JACM.

[21]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[22]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[23]  I. Babuska,et al.  The design and analysis of the Generalized Finite Element Method , 2000 .

[24]  Genki Yagawa,et al.  Generalized nodes and high‐performance elements , 2005 .

[25]  James Hardy Wilkinson,et al.  Rounding errors in algebraic processes , 1964, IFIP Congress.

[26]  Z. Więckowski The material point method in large strain engineering problems , 2004 .

[27]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[28]  Ivo Babuška,et al.  A posteriori error estimation for generalized finite element methods , 2006 .

[29]  Deborah Sulsky,et al.  An unconditionally stable, energy–momentum consistent implementation of the material-point method , 2006 .

[30]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[31]  R. Hill On constitutive macro-variables for heterogeneous solids at finite strain , 1972, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[32]  James Demmel,et al.  Error bounds from extra-precise iterative refinement , 2006, TOMS.

[33]  Wing Kam Liu,et al.  Multiresolution continuum modeling of micro-void assisted dynamic adiabatic shear band propagation , 2010 .

[34]  Jack J. Dongarra,et al.  Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy , 2008, TOMS.

[35]  Rong Tian,et al.  Extra-dof-free and linearly independent enrichments in GFEM , 2013 .

[36]  Robert Strzodka,et al.  Accelerating Double Precision FEM Simulations with GPUs , 2011 .

[37]  Richard J. Goozee,et al.  Distributed and shared memory parallelism with a smoothed particle hydrodynamics code , 2003 .

[38]  Yudan Liu Reliability -aware optimal checkpoint /restart model in high performance computing , 2007 .

[39]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[40]  Franck J. Vernerey,et al.  An interactive micro-void shear localization mechanism in high strength steels , 2007 .

[41]  Robert Strzodka,et al.  Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..

[42]  V. Springel The Cosmological simulation code GADGET-2 , 2005, astro-ph/0505010.

[43]  Larry D. Libersky,et al.  Smooth particle hydrodynamics with strength of materials , 1991 .

[44]  David Le Touzé,et al.  SPH high-performance computing simulations of rigid solids impacting the free-surface of water , 2009 .

[45]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[46]  Oden,et al.  An h-p adaptive method using clouds , 1996 .

[47]  Markus Eisenbach,et al.  Thermodynamics of magnetic systems from first principles: gWL-LSMS , 2009 .

[48]  D. Sulsky,et al.  A particle method for history-dependent materials , 1993 .

[49]  Robert Strzodka,et al.  Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU , 2009, Int. J. Comput. Sci. Eng..

[50]  Benedict D. Rogers,et al.  Towards accelerating smoothed particle hydrodynamics simulations for free-surface flows on multi-GPU clusters , 2012, J. Parallel Distributed Comput..

[51]  Franck Cappello,et al.  Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..

[52]  Genki Yagawa,et al.  Linear dependence problems of partition of unity-based generalized FEMs , 2006 .

[53]  O. C. Zienkiewicz,et al.  A new cloud-based hp finite element method , 1998 .

[54]  Ivo Babuška,et al.  Generalized finite element methods for three-dimensional structural mechanics problems , 2000 .

[55]  Ivo Babuška,et al.  The generalized finite element method for Helmholtz equation: Theory, computation, and open problems , 2006 .

[56]  Yan Liu,et al.  The carbon nanotube composite simulation by material point method , 2012 .

[57]  Carlos Armando Duarte,et al.  Generalized finite element analysis of three-dimensional heat transfer problems exhibiting sharp thermal gradients , 2009 .

[58]  Jack J. Dongarra,et al.  Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..

[59]  Stephen L. Scott,et al.  A reliability-aware approach for an optimal checkpoint/restart model in HPC environments , 2007, 2007 IEEE International Conference on Cluster Computing.

[60]  R. Hill Elastic properties of reinforced solids: some theoretical principles , 1963 .

[61]  John Shalf,et al.  The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..

[62]  Eric Darve,et al.  Assembly of finite element methods on graphics processors , 2011 .

[63]  Yanping Lian,et al.  An adaptive finite element material point method and its application in extreme deformation problems , 2012 .

[64]  Robert Strzodka,et al.  Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[65]  Rong Tian,et al.  Scalability Tests of a Finite Element Code on Hundreds of Thousands Cores and Heterogeneous Architecture , 2012, HiPC 2012.

[66]  Deborah Sulsky,et al.  Implicit dynamics in the material-point method , 2004 .

[67]  E. Aifantis On the role of gradients in the localization of deformation and fracture , 1992 .

[68]  M. Papadrakakis,et al.  GPU-acceleration of stiffness matrix calculation and efficient initialization of EFG meshless methods , 2013 .

[69]  C. Duarte,et al.  Analysis and applications of a generalized finite element method with global-local enrichment functions , 2008 .

[70]  P. Lancaster,et al.  Surfaces generated by moving least squares methods , 1981 .

[71]  Julien Langou,et al.  Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems , 2007, Int. J. High Perform. Comput. Appl..

[72]  Yanping Lian,et al.  Coupling of finite element method with material point method by local multi-mesh contact method , 2011 .

[73]  T. Narumi,et al.  Protein Explorer: A Petaflops Special-Purpose Computer System for Molecular Dynamics Simulations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[74]  Franck Cappello,et al.  Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..

[75]  Guirong Liu,et al.  Smoothed Particle Hydrodynamics (SPH): an Overview and Recent Developments , 2010 .

[76]  H. Wozniakowski,et al.  Iterative refinement implies numerical stability , 1977 .

[77]  Carlos Armando Duarte,et al.  A Generalized Finite Element Method for polycrystals with discontinuous grain boundaries , 2006 .

[78]  Hitoshi Matsubara,et al.  Advanced 4‐node tetrahedrons , 2006 .

[79]  Mark A. Taylor,et al.  CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model , 2012, Int. J. High Perform. Comput. Appl..

[80]  T. Belytschko,et al.  Element‐free Galerkin methods , 1994 .

[81]  Robert A. Dalrymple,et al.  SPH on GPU with CUDA , 2010 .

[82]  Wing Kam Liu,et al.  Meshfree and particle methods and their applications , 2002 .

[83]  Rong Tian,et al.  A multiresolution continuum simulation of the ductile fracture process , 2010 .

[84]  Wing Kam Liu,et al.  Multiresolution analysis for material design , 2006 .

[85]  John R. Williams,et al.  A framework for parallel computational physics algorithms on multi-core: SPH in parallel , 2011, Adv. Eng. Softw..

[86]  Marc Alexander Schweitzer,et al.  Partition of Unity Method , 2003 .

[87]  Cahal McVeigh,et al.  Multiresolution modeling of ductile reinforced brittle composites , 2009 .

[88]  I. Babuska,et al.  The partition of unity finite element method: Basic theory and applications , 1996 .

[89]  Ivo Babuška,et al.  p‐version of the generalized FEM using mesh‐based handbooks with applications to multiscale problems , 2004 .

[90]  R. Tian A PU-BASED 4-NODE QUADRATIC TETRAHEDON AND LINEAR DEPENDENCES ELIMINATION IN THREE-DIMENSIONS , 2006 .