A Multi-level Optimization Strategy to Improve the Performance of Stencil Computation
暂无分享,去创建一个
[1] Helmar Burkhart,et al. Automatic code generation and tuning for stencil kernels on modern shared memory architectures , 2011, Computer Science - Research and Development.
[2] Uday Bondhugula,et al. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors , 2009, PPoPP '09.
[3] Philippe Olivier Alexandre Navaux,et al. Seismic wave propagation simulations on low-power and performance-centric manycores , 2016, Parallel Comput..
[4] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[5] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[6] Dietmar Fey,et al. High Performance Stencil Code Algorithms for GPGPUs , 2011, ICCS.
[7] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[8] Weiqiang Wang,et al. A Multilevel Parallelization Framework for High-Order Stencil Computations , 2009, Euro-Par.
[9] Werner Augustin,et al. Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems , 2009, Euro-Par.
[10] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .
[11] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[12] David E. Keyes,et al. Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates , 2014, SIAM J. Sci. Comput..
[13] Junichiro Makino,et al. Optimal Temporal Blocking for Stencil Computation , 2015, ICCS.
[14] Alejandro Duran,et al. Extending OpenMP* with Vector Constructs for Modern Multicore SIMD Architectures , 2012, IWOMP.
[15] Marcin Dabrowski,et al. Efficient 3D stencil computations using CUDA , 2013, Parallel Comput..
[16] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.