A novel parallel FDTD algorithm on Non-Uniform Memory Access multiprocessors

It is critical to Choose a good threads and data distribution scheme to the performance of data-parallel applications on Non-Uniform Memory Access (NUMA) architecture workstation. In this paper, we introduce a novel parallel finite-difference time-domain (FDTD) algorithm by optimize application threads affinity on NUMA architecture workstation. The algorithm has achieved the excellent performance through an ideal test case and an inverted-F antenna example.

[1]  Collin McCurdy,et al.  Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[2]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[3]  M. F. Pantoja,et al.  Efficient excitation of waveguides in Crank-Nicolson FDTD , 2010 .

[4]  Sverker Holmgren,et al.  affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system , 2005, ICS '05.

[5]  L. Cristoforetti,et al.  Parallel Implementation of a 3D Subgridding FDTD Algorithm for Large Simulations , 2011 .

[6]  Jean-François Méhaut,et al.  Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas , 2010, VECPAR.

[7]  X. Ai,et al.  ANALYSIS OF DISPERSION RELATION OF PIECEWISE LINEAR RECURSIVE CONVOLUTION FDTD METHOD FOR SPACE-VARYING PLASMA , 2011 .

[8]  Yu Zhang,et al.  EMC Analysis of Antennas Mounted on Electrically Large Platforms with Parallel FDTD Method , 2008 .

[9]  Stephen A. Jarvis,et al.  High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation , 2013, Lecture Notes in Computer Science.

[10]  John M. Mellor-Crummey,et al.  A tool to analyze the performance of multithreaded programs on NUMA architectures , 2014, PPoPP '14.

[11]  Gustavo Alonso,et al.  Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[12]  Kenneth A. Ross,et al.  Scalable aggregation on multicore processors , 2011, DaMoN '11.

[13]  Vivien Quéma,et al.  Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.

[14]  Chang-Hong Liang,et al.  STUDY ON SHIELDING EFFECTIVENESS OF METALLIC CAVITIES WITH APERTURES BY COMBINING PARALLEL FDTD METHOD WITH WINDOWING TECHNIQUE , 2007 .