Acceleration of parallel computation for derived micro-modeling circuit by exploiting GPU memory bandwidth limit

It has been shown that a newly proposed micro-modeling method for deriving a concise passive circuit of a large-scale EM problem is highly suitable for GPU parallel computation. However, due to the memory bandwidth limit of GPU, the utilization of GPU is far from its peak performance because more than 97% processing time is occupied by the frequent data transactions. This paper proposes an effective strategy for GPU acceleration of the micro-modeling algorithm, which can significantly reduce data transactions between off-chip memory and in-chip memory of GPUs. A practical numerical example of a large-scale interconnection and packaging problem shows that the proposed strategy is effective and the parallel computation of the micro-modeling circuit using GPUs will be further accelerated by one order of magnitude if 4 or more iterative derivation processes of can be conducted by one run.