Achieving accurate and context‐sensitive timing for code optimization
暂无分享,去创建一个
[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[2] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[3] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[4] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[5] Alan Jay Smith,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.
[6] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[7] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[8] R. C. Whaley,et al. Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[9] Michael F. P. O'Boyle,et al. Feedback Assisted Iterative Compilation , 2000 .
[10] T. Kisuki,et al. Iterative Compilation in Program Optimization , 2000 .
[11] Paul van der Mark,et al. Using Iterative Compilation for Managing Software Pipeline-Unrolling Trade-offs , 1999 .
[12] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[13] Michael F. P. O'Boyle,et al. Iterative Compilation , 2002, Embedded Processor Design Challenges.
[14] Keshav Pingali,et al. Automatic measurement of memory hierarchy parameters , 2005, SIGMETRICS '05.
[15] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[16] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.