Accelerating FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture

The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vector units and more than 200 threads. In this paper, we parallelize FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture. In the implementation, the parallel programming model OpenMP is used to exploit thread parallelism while loop unrolling and SIMD intrinsic functions are utilized to accomplish vectorization. Compared with the serial version on Intel Xeon E5-2670 CPU, the implementation on the MIC coprocessor including 57 cores obtains a speedup of 11.57 times. The experiment results also demonstrate that the parallelization has good scalability in performance. Additionally, how binding relationship between OpenMP threads and hardware threads in MIC influences performance is also reported.

[1]  Jiang Jiang,et al.  Parallel 3D deterministic particle transport on Intel MIC architecture , 2014, 2014 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Tao Gao,et al.  Using the Intel Many Integrated Core to accelerate graph traversal , 2014, Int. J. High Perform. Comput. Appl..

[3]  Jie Liu,et al.  Accelerating embarrassingly parallel algorithm on Intel MIC , 2014, 2014 IEEE International Conference on Progress in Informatics and Computing.

[4]  Stephen A. Jarvis,et al.  Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[5]  A. Taflove,et al.  Numerical Solution of Steady-State Electromagnetic Scattering Problems Using the Time-Dependent Maxwell's Equations , 1975 .

[6]  Soichi Watanabe,et al.  A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[7]  Lei Xu,et al.  Implementation and Optimization of Three-Dimensional UPML-FDTD Algorithm on GPU Clusters , 2014, ISC.

[8]  Hideki Asai,et al.  GPU-Based Massively Parallel 3-D HIE-FDTD Method for High-Speed Electromagnetic Field Simulation , 2012, IEEE Transactions on Electromagnetic Compatibility.

[9]  Pradeep Dubey,et al.  Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[10]  Ümit V. Çatalyürek,et al.  An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[11]  F. Costen,et al.  Development of a CUDA Implementation of the 3D FDTD Method , 2012, IEEE Antennas and Propagation Magazine.

[12]  Hariyama Masanori,et al.  Design of an FPGA-Based FDTD Accelerator Using OpenCL , 2013 .