Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model
暂无分享,去创建一个
[1] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[2] Christian Lengauer,et al. Optimization of two Jacobi Smoother Kernels by Domain-Specific Program Transformation , 2014 .
[3] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[4] Gerhard Wellein,et al. Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.
[5] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[6] Gerhard Wellein,et al. Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters , 2010, Parallel Process. Lett..
[7] Alfred V. Aho,et al. Compilers: Principles, Techniques, & Tools with Gradiance , 2007 .
[8] Katherine Yelick,et al. Auto-tuning stencil codes for cache-based multicore platforms , 2009 .
[9] Gerhard Wellein,et al. Exploring performance and power properties of modern multi‐core chips via simple machine models , 2012, Concurr. Comput. Pract. Exp..
[10] Matthias Bolten,et al. Multigrid methods for structured grids and their application in particle simulation , 2008 .
[11] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[12] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[14] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[15] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.
[16] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.
[17] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[18] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[19] Georg Hager,et al. Introducing a Performance Model for Bandwidth-Limited Loop Kernels , 2009, PPAM.
[20] Weiqiang Wang,et al. In-Core Optimization of High-Order Stencil Computations , 2009, PDPTA.
[21] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.