? SC (Or, Can Adding Scalable Locality to Distributed Shared Memory Yield SuperComputer Power?)
暂无分享,去创建一个
[1] Rudolf Eigenmann,et al. Optimizing OpenMP Programs on Software Distributed Shared Memory Systems , 2004, International Journal of Parallel Programming.
[2] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[3] Rudolf Eigenmann,et al. Towards OpenMP Execution on Software Distributed Shared Memory Systems , 2002, ISHPC.
[4] Robert A. van de Geijn,et al. Satisfying your dependencies with SuperMatrix , 2007, 2007 IEEE International Conference on Cluster Computing.
[5] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[6] William Pugh,et al. Iteration Space Slicing for Locality , 1999, LCPC.
[7] Armin R. Mikler,et al. NetPIPE: A Network Protocol Independent Performance Evaluator , 1996 .
[8] Rudolf Eigenmann,et al. Towards automatic translation of OpenMP to MPI , 2005, ICS '05.
[9] Sanjay Rajopadhye,et al. Positivity, posynomials and tile size selection , 2008, HiPC 2008.
[10] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[11] V. Rich. Personal communication , 1989, Nature.
[12] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[13] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.
[14] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[15] Greg Bronevetsky,et al. Communication-Sensitive Static Dataflow for Parallel Message Passing Applications , 2009, 2009 International Symposium on Code Generation and Optimization.
[16] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[17] David G. Wonnacott,et al. Achieving Scalable Locality with Time Skewing , 2002, International Journal of Parallel Programming.
[18] David G. Wonnacott,et al. Using time skewing to eliminate idle time due to memory bandwidth and network limitations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[19] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[20] Martin Griebl,et al. Automatic code generation for distributed memory architectures in the polytope model , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[21] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).