MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs
暂无分享,去创建一个
Yiqun Liu | Yunquan Zhang | Guoping Long | Haipeng Jia | Yan Li | Haipeng Jia | Yiqung Liu | Guo-Ping Long | Yan Li | YunSheng Zhang
[1] R. W. Johnson,et al. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures , 1990 .
[2] Budirijanto Purnomo,et al. ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs , 2010, SIGGRAPH '10.
[3] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[4] C. Loan. Computational Frameworks for the Fast Fourier Transform , 1992 .
[5] Naga K. Govindaraju,et al. Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.
[6] Dragan Mirkovic,et al. Empirical Auto-tuning Code Generator for FFT and Trigonometric Transforms , 2007 .
[7] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Satoshi Matsuoka,et al. Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[9] Liang Gu,et al. An empirically tuned 2D and 3D FFT library on CUDA GPU , 2010, ICS '10.
[10] David Kaeli,et al. Heterogeneous Computing with OpenCL , 2011 .
[11] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[12] Paul N. Swarztrauber,et al. Multiprocessor FFTs , 1987, Parallel Comput..
[13] Franz Franchetti,et al. Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.
[14] Wu-chun Feng,et al. Power and Performance Characterization of Computational Kernels on the GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.
[15] Markus Püschel,et al. Offline library adaptation using automatically generated heuristics , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[16] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[17] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[18] R. Tolimieri,et al. Algorithms for Discrete Fourier Transform and Convolution , 1989 .
[19] Martin Vetterli,et al. Fast Fourier transforms: a tutorial review and a state of the art , 1990 .
[20] Dragan Mirkovic,et al. Automatic Performance Tuning in the UHFFT Library , 2001, International Conference on Computational Science.
[21] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[22] David Padua,et al. Encyclopedia of Parallel Computing , 2011 .
[23] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[24] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[25] Feng Ji,et al. Using Shared Memory to Accelerate MapReduce on Graphics Processing Units , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[26] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[27] Dragan Mirkovic,et al. An adaptive software library for fast Fourier transforms , 2000, ICS '00.