Automating Compiler-Directed Autotuning for Phased Performance Behavior
暂无分享,去创建一个
[1] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[2] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[3] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] A Thesis,et al. Tiling Stencil Computations to Maximize Parallelism , 2013 .
[5] Michael Wolfe,et al. Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.
[6] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[7] R. C. Whaley,et al. ATLAS (Automatically Tuned Linear Algebra Software) , 2011, Encyclopedia of Parallel Computing.
[8] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[9] Alan Edelman,et al. Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[10] Prasanna Balaprakash,et al. Generating Efficient Tensor Contractions for GPUs , 2015, 2015 44th International Conference on Parallel Processing.
[11] J. Ramanujam,et al. A framework for enhancing data reuse via associative reordering , 2014, PLDI.
[12] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[13] Shoaib Ashraf Kamil,et al. Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages , 2012 .
[14] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[15] Hongbo Rong,et al. Automating Wavefront Parallelization for Sparse Matrix Computations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[17] Mary W. Hall,et al. Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.
[18] Protonu Basu,et al. Compiler Optimizations and Attuning for Stencils and Geometric Multigrid , 2016 .
[19] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[20] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[21] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[22] Samuel Williams,et al. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid , 2013, 20th Annual International Conference on High Performance Computing.
[23] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[24] Samuel Williams,et al. Compiler-Directed Transformation for Higher-Order Stencils , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[25] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[26] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[27] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .