Exploiting domain knowledge to optimize parallel computational mechanics codes

An important emerging problem domain in computational science and engineering is the development of multi-scale computational methods for complex problems in mechanics that span multiple spatial and temporal scales. An attractive approach to solving these problems is recursive decomposition: the problem is broken up into a tree of loosely coupled sub-problems which can be solved independently and then coupled back together to obtain the desired solution. However, a particular problem can be solved in myriad ways by coupling the sub-problems together in different tree orders. As we argue in this paper, the space of possible orders is vast, the performance gap between an arbitrary order and the best order is potentially quite large, and the likelihood that a domain scientist can find the best order to solve a problem on a particular machine is vanishingly small. In this paper, we present a system that uses domain-specific knowledge captured in computational libraries to optimize code written in a conventional language (C). The system generates efficient coupling orders to solve computational mechanics problems using recursive decomposition. Our system adopts the inspector-executor paradigm, where the problem is inspected and a novel heuristic finds an effective implementation based on domain properties evaluated by a cost model. The derived implementation is then executed by a parallel run-time system (Cilk) which achieves optimal parallel performance. We demonstrate that our cost model is highly correlated with actual application runtime, that our proposed technique outperforms non-decomposed and non-multiscale methods. The code generated by the heuristic also outperforms alternate scheduling strategies, as well as over 99% of randomly-generated recursive decompositions sampled from the space of possible solutions.

[1]  David A. Padua,et al.  A MATLAB to Fortran 90 translator and its effectiveness , 1996, ICS '96.

[2]  J. Demmel,et al.  Sun Microsystems , 1996 .

[3]  Nathan M. Newmark,et al.  A Method of Computation for Structural Dynamics , 1959 .

[4]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[5]  Joel H. Saltz,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93. Proceedings.

[6]  Arun Prakash,et al.  Multi-Time-Step Domain Decomposition and Coupling Methods for Non-Linear Structural Dynamics , 2007 .

[7]  Charbel Farhat,et al.  Modeling and Simulation of Multiphysics Systems , 2005, J. Comput. Inf. Sci. Eng..

[8]  David A. Padua,et al.  SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.

[9]  Masha Sosonkina,et al.  pARMS: a parallel version of the algebraic recursive multilevel solver , 2003, Numer. Linear Algebra Appl..

[10]  Jack Dongarra,et al.  LAPACK's user's guide , 1992 .

[11]  Richard B. Lehoucq,et al.  An Automated Multilevel Substructuring Method for Eigenspace Computation in Linear Elastodynamics , 2004, SIAM J. Sci. Comput..

[12]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[13]  Jun Zhang,et al.  BILUTM: A Domain-Based Multilevel Block ILUT Preconditioner for General Sparse Matrices , 1999, SIAM J. Matrix Anal. Appl..

[14]  Wolfgang Hackbusch,et al.  On the Computation of Approximate Eigenvalues and Eigenfunctions of Elliptic Operators by Means of a Multi-Grid Method , 1979 .

[15]  U. Hetmaniuk,et al.  A comparison of eigensolvers for large‐scale 3D modal analysis using AMG‐preconditioned iterative methods , 2005 .

[16]  Andrew Lumsdaine,et al.  Reusable, generic program analyses and transformations , 2009, GPCE '09.

[17]  Ken Kennedy,et al.  Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..

[18]  Larry Carter,et al.  Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.

[19]  David A. Padua,et al.  MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.

[20]  Keshav Pingali,et al.  A case for source-level transformations in MATLAB , 1999, DSL '99.

[21]  Wolfgang A. Wall,et al.  Towards a taxonomy for multiscale methods in computational mechanics: building blocks of existing methods , 2007 .

[22]  A. Prakash,et al.  A FETI‐based multi‐time‐step coupling method for Newmark schemes in structural dynamics , 2004 .

[23]  Ken Kennedy,et al.  Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization , 2001, ICS '01.

[24]  David E. Bernholdt,et al.  A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[25]  S. Natsiavas,et al.  Dynamics of large scale mechanical models using multilevel substructuring , 2007 .

[26]  Jacob Fish,et al.  Bridging the scales in nano engineering and science , 2006 .

[27]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..