AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs
暂无分享,去创建一个
Xiao Wang | Yunquan Zhang | Zhihao Li | Liang Yuan | Tun Chen | Haipeng Jia | Luning Cao
[1] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.
[2] Chunye Gong,et al. An efficient parallel solution for Caputo fractional reaction–diffusion equation , 2014, The Journal of Supercomputing.
[3] L. Johnsson,et al. UHFFT : A High Performance DFT Framework , 2007 .
[4] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[5] Dhairya Malhotra,et al. AccFFT: A library for distributed-memory FFT on CPU and GPU architectures , 2015, ArXiv.
[6] Anthony Blake,et al. Dynamically Generating FFT Code , 2014, Journal of Signal Processing Systems.
[7] Xiao Wang,et al. Efficient parallel optimizations of a high-performance SIFT on GPUs , 2019, J. Parallel Distributed Comput..
[8] S. Lennart Johnsson,et al. Adaptive Computation of Self Sorting In-Place FFTs on Hierarchical Memory Architectures , 2007, HPCC.
[9] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[10] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[11] Cris Cecka,et al. Low Communication FMM-Accelerated FFT on GPUs , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[13] Peter D. Welch,et al. The Fast Fourier Transform and Its Applications , 1969 .
[14] Franz Franchetti,et al. Formal loop merging for signal transforms , 2005, PLDI '05.
[15] Zhibin Chen,et al. Accurate simulation of turbulent phase screen using optimization method , 2019, Optik (Stuttgart).
[16] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[17] G. Bruun. z-transform DFT filters and FFT's , 1978 .
[18] Satoshi Matsuoka,et al. High performance 3-D FFT using multiple CUDA GPUs , 2012, GPGPU-5.
[19] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[20] T. Parks,et al. A prime factor FFT algorithm using high-speed convolution , 1977 .
[21] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[22] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[23] Pedro Costa,et al. A FFT-based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows , 2018, Comput. Math. Appl..
[24] Paul N. Swarztrauber,et al. Vectorizing the FFTs , 1982 .
[25] L. Bluestein. A linear filtering approach to the computation of discrete Fourier transform , 1970 .
[26] C. Rader,et al. A new principle for fast Fourier transformation , 1976 .
[27] C. Rader. Discrete Fourier transforms when the number of data samples is prime , 1968 .
[28] Dan Petre,et al. OpenCL™ FFT Optimizations for Intel® Processor Graphics , 2016, IWOCL.
[29] Yiqun Liu,et al. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs , 2013, Journal of Computer Science and Technology.
[30] Doru-Thom Popovici,et al. Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[31] Chunye Gong,et al. A parallel algorithm for the Riesz fractional reaction-diffusion equation with explicit finite difference method , 2013 .
[32] Franz Franchetti,et al. Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.
[33] P. Duhamel,et al. `Split radix' FFT algorithm , 1984 .
[34] Thomas G. Stockham,et al. High-speed convolution and correlation , 1966, AFIPS '66 (Spring).
[35] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[36] Tze Meng Low,et al. SPIRAL: Extreme Performance Portability , 2018, Proceedings of the IEEE.