论文信息 - A comparison of the FDTD algorithm implemented on an integrated GPU versus a GPU configured as a co-processor

A comparison of the FDTD algorithm implemented on an integrated GPU versus a GPU configured as a co-processor

GPUs are commonly configured as slave devices, receiving FDTD task information and data from the host computer via a Peripheral Component Interconnect Express (PCIe) bus. The Accelerated Processing Uni t(APU) has both an integrated GPU and several conventional cores on the same Integrated Circuit die. The FDTD method is implemented on the Accelerated Processing Unit's integrated GPU using the DirectCompute application programming interface and compared against an FDTD implementation on a GPU configured as a co-processor via a PCIe bus. The FDTD method is also implemented in parallel on the APU using the vector processing capability of the cores. The arrangement of both GPU and CPU cores on the same die has the potential to allow the concurrent processing of the FDTD method on both GPU and multi-core processor.

D. B. Davidson | R. G. Ilgner

[1] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[2] Kang Li,et al. Parallel 3D Finite Difference Time Domain Simulations on Graphics Processors with Cuda , 2009, 2009 International Conference on Computational Intelligence and Software Engineering.

[3] N. Chavannes,et al. Parallel implementation of the Finite-Difference Time-Domain method in Open Computing Language , 2010, 2010 International Conference on Electromagnetics in Advanced Applications.

[4] Wenhua Yu,et al. Acceleration study for the FDTD method using SSE and AVX instructions , 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet).

[5] Chia-Lin Yang,et al. A cycle-level SIMT-GPU simulation framework , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[6] N. Takada,et al. High-speed FDTD simulation algorithm for GPU with compute unified device architecture , 2009, 2009 IEEE Antennas and Propagation Society International Symposium.

[7] Raj Mittra,et al. Advanced features to enhance the FDTD method in GEMS simulation software package , 2011, 2011 IEEE International Symposium on Antennas and Propagation (APSURSI).

[8] Maciej Sypniewski,et al. The method of improving performace of the GPU-accelerated 2D FDTD simulator , 2010, 18-th INTERNATIONAL CONFERENCE ON MICROWAVES, RADAR AND WIRELESS COMMUNICATIONS.

[9] Allen Taflove,et al. Computational Electrodynamics the Finite-Difference Time-Domain Method , 1995 .

[10] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.

[11] K. Yee. Numerical solution of initial boundary value problems involving maxwell's equations in isotropic media , 1966 .

[12] Yongjun Liu,et al. A novel hardware acceleration technique for high performance parallel FDTD method , 2011, IEEE iWEM2011.

[13] Atef Z. Elsherbeni,et al. Programming finite-difference time-domain for graphics processor units using compute unified device architecture , 2010, 2010 IEEE Antennas and Propagation Society International Symposium.

[14] David B. Davidson,et al. Computational Electromagnetics for RF and Microwave Engineering , 2005 .

[15] Rune J. Hovland,et al. Latency and Bandwidth Impact on Gpu-systems , 2008 .

[16] David A. Bader,et al. A novel FDTD application featuring OpenMP-MPI hybrid parallelization , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..