Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
暂无分享,去创建一个
[1] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[2] A. WulfWm.,et al. Hitting the memory wall , 1995 .
[3] Emilio L. Zapata,et al. An analytical model of locality-based parallel irregular reductions , 2008, Parallel Comput..
[4] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[5] Michael Gschwind,et al. Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture , 2006, IBM Syst. J..
[6] Joel H. Saltz,et al. Principles of runtime support for parallel processors , 1988, ICS '88.
[7] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[8] Emilio L. Zapata,et al. Data partitioning‐based parallel irregular reductions , 2004, Concurr. Comput. Pract. Exp..
[9] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[10] Eduard Ayguadé,et al. Nanos mercurium: A research compiler for OpenMP , 2004 .
[11] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[12] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[13] Ibm Redbooks,et al. Programming the Cell Broadband Engine Architecture: Examples and Best Practices , 2008 .
[14] Zhiyuan Li. Array privatization for parallel execution of loops , 1992, ICS.
[15] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[16] David A. Padua,et al. On the Automatic Parallelization of Sparse and Irregular Fortran Programs , 1998, LCR.
[17] Paul Feautrier,et al. Array expansion , 1988, ICS '88.
[18] Larry Carter,et al. Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.
[19] Kathryn M. O'Brien,et al. Optimizing the Use of Static Buffers for DMA on a CELL Chip , 2006, LCPC.
[20] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.
[21] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.
[22] Chau-Wen Tseng,et al. A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.
[23] David A. Padua,et al. Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.
[24] Benjamin Rose,et al. A comparison of programming models for multiprocessors with explicitly managed memory hierarchies , 2009, PPoPP '09.
[25] M. Karplus,et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .
[26] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[27] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[28] Michael Gschwind,et al. Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[29] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[30] P. Feautrier. Array expansion , 1988 .
[31] William J. Dally,et al. Scatter-add in data parallel architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.
[32] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[33] M. Frans Kaashoek,et al. tcc: A Template-Based Compiler for ‘C , 2007 .
[34] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.