Auto-tuning full applications: A case study
暂无分享,去创建一个
Chun Chen | Ananta Tiwari | Daniel J. Quinlan | Jacqueline Chame | Jeffrey K. Hollingsworth | Chunhua Liao | Mary W. Hall
[1] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[2] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[3] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[4] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[5] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[6] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[7] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[8] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[9] Bronis R. de Supinski,et al. A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries , 2010, IWOMP.
[10] Daniel J. Quinlan,et al. Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions , 2010, International Journal of Parallel Programming.
[11] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[12] Vahid Tabatabaee,et al. Parallel Parameter Tuning for Applications with Performance Variability , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[13] Chuan Lu,et al. Simulating subsurface flow and transport on ultrascale computers using PFLOTRAN , 2007 .
[14] Chun Chen,et al. Model-guided empirical optimization for memory hierarchy , 2007 .
[15] José Nelson Amaral,et al. Ablego: a function outlining and partial inlining framework , 2007, Softw. Pract. Exp..
[16] Katherine Yelick,et al. Performance Engineering: Understanding and Improving thePerformance of Large-Scale Codes , 2007 .
[17] Ken Kennedy,et al. Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.
[18] I-Hsin Chung,et al. A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.
[19] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[20] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[21] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[22] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[23] José Nelson Amaral,et al. Ablego : a function outlining and partial inlining framework: Research Articles , 2007 .
[24] Albert Cohen,et al. Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.
[25] Samuel Williams,et al. Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[26] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[27] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[28] Arun Lakhotia,et al. Restructuring programs by tucking statements into functions , 1998, Inf. Softw. Technol..
[29] Robert D. Falgout,et al. Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..
[30] Yoon-Ju Lee,et al. A Code Isolator: Isolating Code Fragments from Large Programs , 2004, LCPC.
[31] Richard W. Vuduc,et al. Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.