论文信息 - Large-Scale Algorithm Design for Parallel FFT-based Simulations on GPUs

Large-Scale Algorithm Design for Parallel FFT-based Simulations on GPUs

We describe and analyze a co-design of algorithm and software for high-performance simulation of a partial differential equation (PDE) numerical solver for large-scale datasets. Large-scale scientific simulations involving parallel Fast Fourier Transforms (FFTs) have extreme memory requirements and high communication cost. This hampers high resolution analysis with fine grids. Moreover, it is difficult to accelerate legacy Fortran scientific codes with modern hardware such as GPUs because of memory constraints of GPUs. Our proposed solution uses signal processing techniques such as lossy compression and domain-local FFTs to lower iteration cost without adversely impacting accuracy of the result. In this work, we discuss proof-of-concept results for various aspects of algorithm development.

Franz Franchetti | Jelena Kovacevic | Anuva Kulkarni

[1] Michael Unser,et al. Splines: a perfect fit for signal and image processing , 1999, IEEE Signal Process. Mag..

[2] John A. Gunnels,et al. Optimization of fast Fourier transforms on the Blue Gene/L supercomputer , 2008, HiPC'08.

[3] Franck Cappello,et al. Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[4] Martin Burtscher,et al. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data , 2009, IEEE Transactions on Computers.

[5] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[6] Hervé Moulinec,et al. A numerical method for computing the overall response of nonlinear composites with complex microstructure , 1998, ArXiv.

[7] Robert Latham,et al. ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[8] Toshio Mura,et al. Micromechanics of defects in solids , 1982 .

[9] R. Lebensohn. N-site modeling of a 3D viscoplastic polycrystal using Fast Fourier Transform , 2001 .

[10] Peter Lindstrom,et al. Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.