A multiple-GPU based parallel independent coefficient reanalysis method and applications for vehicle design

An independent coefficient reanalysis method is reconstructed for multi-GPU platform by using MPI and CUDA.The data partition for multiple GPUs is successfully implemented to achieve good load balance.The suggested non-blocking communication strategy achieves higher speedups compared with blocking one.The bottleneck of GPU memory can be solved. The main limits of reanalysis method using CUDA (Compute Unified Device Architecture) for large-scale engineering optimization problems are low efficiency on single GPU and memory bottleneck of GPU. To breakthrough these bottlenecks, an efficient parallel independent coefficient (IC) reanalysis method is developed based on multiple GPUs platform. The IC reanalysis procedure is reconstructed to accommodate the use of multiple GPUs. The matrices and vectors are successfully partitioned and prepared for each GPU to achieve good load balance as well as little communication between GPUs. This study also proposes an effective technique to overlap the computation and communication by using non-blocking communication strategy. GPUs would continue their succeeding tasks while communication is still carried out simultaneously. Furthermore, the CSR format is used in each GPU for saving the memory. Finally, large-scale vehicle design problems are implemented by the developed solver. According to the test results, the multi-GPU based IC reanalysis method has potential capability for handling the real large scale problem and reducing the design cycle.

[1]  Kenneth Moreland,et al.  The FFT on a GPU , 2003, HWWS '03.

[2]  U. Kirsch Combined approximations – a general reanalysis approach for structural optimization , 2000 .

[3]  Henk A. van der Vorst,et al.  A Vectorizable Variant of some ICCG Methods , 1982 .

[4]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[5]  Rüdiger Westermann,et al.  Acceleration techniques for GPU-based volume rendering , 2003, IEEE Visualization, 2003. VIS 2003..

[6]  C. Farhat,et al.  A method of finite element tearing and interconnecting and its parallel solution algorithm , 1991 .

[7]  Lifeng Zhu,et al.  Accurate stitching for polygonal surfaces , 2009, 2009 11th IEEE International Conference on Computer-Aided Design and Computer Graphics.

[8]  Jonathan Cohen,et al.  Title: A Fast Double Precision CFD Code using CUDA , 2009 .

[9]  Manolis Papadrakakis,et al.  A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures , 2011 .

[10]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[11]  U. Kirsch Reduced basis approximations of structural displacements for optimaldesign , 1991 .

[12]  Hu Wang,et al.  A Parallel Reanalysis Method Based on Approximate Inverse Matrix for Complex Engineering Problems , 2013 .

[13]  Daniel Cohen-Or,et al.  iWIRES: an analyze-and-edit approach to shape manipulation , 2009, ACM Trans. Graph..

[14]  Su-huan Chen,et al.  Extended Kirsch Combined Method for Eigenvalue Reanalysis , 2000 .

[15]  Jie Zhang,et al.  3D triangular mesh optimization in geometry processing for CAD , 2007, Symposium on Solid and Physical Modeling.

[16]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[17]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[18]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[19]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[20]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[21]  Guangyao Li,et al.  A reanalysis method for local modification and the application in large-scale problems , 2014 .

[22]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[23]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[24]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[25]  Su-huan Chen,et al.  A universal method for structural static reanalysis of topological modifications , 2004 .

[26]  L. Leu,et al.  A reduced basis method for geometric nonlinear analysis of structures , 1998 .

[27]  Inanc Senocak,et al.  An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters , 2010 .