Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures
暂无分享,去创建一个
[1] Barbara M. Chapman,et al. Performance Oriented Programming for NUMA Architectures , 2001, WOMPAT.
[2] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Chris Johnson,et al. Data Distribution , Migration and Replication on a cc-NUMA Architecture , 2002 .
[4] Barbara M. Chapman,et al. Enabling locality-aware computations in OpenMP , 2010, Sci. Program..
[5] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[6] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[7] Robert J. Fowler,et al. NUMA policies and their relation to memory architecture , 1991, ASPLOS IV.
[8] J. Shalf,et al. Lawrence Berkeley National Laboratory Recent Work Title Auto-Tuning the 27-point Stencil for Multicore Permalink , 2009 .
[9] Torsten Hoefler,et al. NUMA-aware shared-memory collective communication for MPI , 2013, HPDC.
[10] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[11] David A. Padua,et al. Compiler Techniques for the Distribution of Data and Computation , 2003, IEEE Trans. Parallel Distributed Syst..
[12] Lixia Liu,et al. Improving parallelism and locality with asynchronous algorithms , 2010, PPoPP '10.
[13] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[14] Joseph Antony,et al. Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport , 2006, HiPC.
[15] Peter Messmer,et al. Parallel data-locality aware stencil computations on modern micro-architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[16] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[17] Eduard Ayguadé,et al. Is Data Distribution Necessary in OpenMP? , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[18] Robert Strzodka,et al. NUMA Aware Iterative Stencil Computations on Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[19] Qing Yi,et al. POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..
[20] Jonathan Harris,et al. Extending OpenMP For NUMA Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).