Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications
暂无分享,去创建一个
Wen-mei W. Hwu | Sara S. Baghsorkhi | Matthieu Delahaye | William D. Gropp | W. Gropp | W. Hwu | Matthieu Delahaye
[1] Marc Snir,et al. Automatic tuning matrix multiplication performance on graphics hardware , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[2] Michael Wolfe,et al. Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.
[3] Rudolf Eigenmann,et al. Fast and effective orchestration of compiler optimizations for automatic performance tuning , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[4] Ken Kennedy,et al. Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.
[5] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[6] John M. Mellor-Crummey,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.
[7] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.
[8] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[9] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .
[10] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1984, TOPL.
[11] N.K. Govindaraju,et al. A Memory Model for Scientific Algorithms on Graphics Processors , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[12] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[13] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[14] Michael Wolfe,et al. Beyond induction variables , 1992, PLDI '92.
[15] David I. August,et al. Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[16] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[17] Mark J. Clement,et al. Analytical performance prediction on multicomputers , 1993, Supercomputing '93. Proceedings.
[18] Weiguo Liu,et al. Performance Predictions for General-Purpose Computation on GPUs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).
[19] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[20] Chen Ding,et al. Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[21] Rudolf Eigenmann,et al. Fast, automatic, procedure-level performance tuning , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[22] R. C. Whaley,et al. Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[23] FerranteJeanne,et al. The program dependence graph and its use in optimization , 1987 .