Modeling the Performance of Geometric Multigrid Stencils on Multicore Computer Architectures
暂无分享,去创建一个
[1] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] James Demmel,et al. Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[3] Pen-Chung Yew,et al. Tile size selection revisited , 2013, ACM Trans. Archit. Code Optim..
[4] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[5] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[6] Anne Greenbaum,et al. Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.
[7] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[8] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[9] Richard Barrett,et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.
[10] William L. Briggs,et al. A multigrid tutorial , 1987 .
[11] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[12] Ulrich Rüde,et al. Fixed and Adaptive Cache Aware Algorithms for Multigrid Methods , 2000 .
[13] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[14] Howard C. Elman,et al. Finite Elements and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics , 2014 .
[15] Jack Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010 .
[16] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[17] Craig C. Douglas,et al. Caching in with Multigrid Algorithms: Problems in Two Dimensions , 1996, Parallel Algorithms Appl..
[18] Jan Treibig,et al. Efficiency improvements of iterative numerical algorithms on modern architectures , 2008 .
[19] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[20] Christian Lengauer,et al. Loop Parallelization in the Polytope Model , 1993, CONCUR.
[21] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[22] Andrew Selle,et al. Efficient elasticity for character skinning with contact and collisions , 2011, SIGGRAPH 2011.
[23] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[24] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[25] Larry Carter,et al. An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..
[26] Yolande Berbers,et al. Efficient Synchronization for Stencil Computations Using Dynamic Task Graphs , 2013, ICCS.
[27] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[28] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .
[29] Wim Vanroose,et al. Improving the arithmetic intensity of multigrid with the help of polynomial smoothers , 2012, Numer. Linear Algebra Appl..
[30] Ulrich Rüde,et al. Cache-Aware Multigrid Methods for Solving Poisson's Equation in Two Dimensions , 2000, Computing.
[31] Ulrich Rüde,et al. Cache Optimization for Structured and Unstructured Grid Multigrid , 2000 .
[32] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[33] David A. Patterson,et al. Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .
[34] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[35] Uday Bondhugula,et al. Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.