Structuring data for concurrent vectorized processing in a transient dynamics finite element program

Most transient dynamics analysis programs employ explicit time integration with a lumped (diagonalized) mass matrix. These programs predict the response of solids to intense dynamic loads, such as crash, penetration, or explosive loads. The explicit integration architecture avoids the solution of coupled matrix equations, which is of great advantage in these highly nonlinear problems. Explicit integration algorithms are naturally well suited to both vector and concurrent processing. However, to realize the advantage of concurrent vector processors, many traditional schemes for managing finite element data in memory must be changed. Optimal performance requires vectorization of as much code as possible. Finite element programs must be designed from a data flow perspective in order to fit comfortably into the architectures of modern computing hardware. A synopsis of how this approach was taken for production transient dynamic analysis programs PRONTO 2D and PRONTO 3D is presented. There are two points in the explicit integration loop where each finite element cannot be processed independently. The first is the gathering of nodal data (coordinates, displacements, velocities, etc.) via the element connectivity table. The second is the assembly of nodal force contributions from each element. These two processes dominate overall performance. The implications of the hardwaremore » architecture of concurrent processors on these crucial areas will be discussed. 2 refs.« less