Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW
暂无分享,去创建一个
[1] Jeremy G. Siek,et al. The Matrix Template Library: A Generic Programming Approach to High Performance Numerical Linear Algebra , 1998, ISCOPE.
[2] P. Yip,et al. Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .
[3] William Kahan,et al. Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .
[4] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[5] Todd L. Veldhuizen,et al. Using C++ template metaprograms , 1996 .
[6] PeiZong Lee,et al. An efficient prime-factor algorithm for the discrete cosine transform and its hardware implementations , 1994, IEEE Trans. Signal Process..
[7] P. Yip,et al. The decimation-in-frequency algorithms for a family of discrete sine and cosine transforms , 1988 .
[8] Katherine A. Yelick,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.
[9] A.V. Oppenheim,et al. Analysis of linear digital networks , 1975, Proceedings of the IEEE.
[10] Guoan Bi,et al. DCT algorithms for composite sequence lengths , 1998, IEEE Trans. Signal Process..
[11] G.S. Moschytz,et al. Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[12] E. Im,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.
[13] PeiZong Lee,et al. Restructured recursive DCT and DST algorithms , 1994, IEEE Trans. Signal Process..
[14] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[15] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[16] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[17] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[18] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[19] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[20] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[21] Graham A. Jullien,et al. Recursive algorithms for the forward and inverse discrete cosine transform with arbitrary length , 1994, IEEE Signal Processing Letters.
[22] Zhao Zhijin,et al. Recursive algorithms for discrete cosine transform , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).
[23] Dennis Gannon,et al. Active Libraries: Rethinking the roles of compilers and libraries , 1998, ArXiv.
[24] Eun Im,et al. Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .
[25] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997 .
[26] Daniel Pak-Kong Lun. On efficient software realization of the prime factor discrete cosine transform , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[28] William H. Press,et al. Numerical recipes in C , 2002 .
[29] A. W. M. van den Enden,et al. Discrete Time Signal Processing , 1989 .
[30] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.
[31] Lap-Pui Chau,et al. Recursive algorithm for the discrete cosine transform with general lengths , 1994 .