Understanding stencil code performance on multicore architectures
暂无分享,去创建一个
Apan Qasem | Faizur Rahman | Qing Yi | Qing Yi | Faizur Rahman | Apan Qasem
[1] Stéphane Eranian. What can performance counters do for memory subsystem analysis? , 2008, MSPC '08.
[2] Bilha Mendelson,et al. Detecting Change in Program Behavior for Adaptive Optimization , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[3] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Jeffrey K. Hollingsworth,et al. Using Hardware Counters to Automatically Improve Memory Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[5] Jichi Guo,et al. Automated empirical tuning of scientific codes for performance and power consumption , 2011, HiPEAC.
[6] Nathan R. Tallent,et al. Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.
[7] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[8] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[9] Peter Messmer,et al. Parallel data-locality aware stencil computations on modern micro-architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[10] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[11] R TallentNathan,et al. Effective performance measurement and analysis of multithreaded applications , 2009 .
[12] Basilio B. Fraguela,et al. Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[13] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[14] Lixia Liu,et al. Improving parallelism and locality with asynchronous algorithms , 2010, PPoPP '10.
[15] Sally A. McKee,et al. Real time power estimation and thread scheduling via performance counters , 2009, CARN.
[16] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..
[17] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[18] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[19] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[20] John M. Mellor-Crummey,et al. Pinpointing and Exploiting Opportunities for Enhancing Data Reuse , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.
[21] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[22] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[23] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[24] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[25] Jack J. Dongarra,et al. Feedback-directed thread scheduling with memory considerations , 2007, HPDC '07.
[26] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[27] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[28] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[29] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[30] Michael F. P. O'Boyle,et al. Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).