PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations
暂无分享,去创建一个
[1] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[2] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Rudolf Eigenmann,et al. PEAK—a fast and effective performance tuning system via compiler optimization orchestration , 2008, TOPL.
[4] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[5] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[6] Zhiyuan Li,et al. Automatic tiling of iterative stencil loops , 2004, TOPL.
[7] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.
[8] Dhabaleswar K. Panda,et al. Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.
[10] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[11] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[12] Rudolf Eigenmann,et al. Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.
[13] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[14] Peter Messmer,et al. Parallel data-locality aware stencil computations on modern micro-architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[15] Robert Strzodka,et al. Time skewing made simple , 2011, PPoPP '11.
[16] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[17] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[18] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[19] Zhiyuan Li,et al. A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.
[20] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..