Compiling affine loop nests for distributed-memory parallel architectures
暂无分享,去创建一个
[1] Feng Liu,et al. Scalable Speculative Parallelization on Commodity Clusters , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.
[3] Ken Kennedy,et al. Automatic data layout for distributed-memory machines , 1998, TOPL.
[4] John A. Chandy,et al. The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.
[5] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[6] Paul Feautrier,et al. Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.
[7] Peiyi Tang,et al. Reducing data communication overhead for DOACROSS loop nests , 1994, ICS '94.
[8] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[9] Uday Bondhugula,et al. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors , 2009, PPoPP '09.
[10] Cédric Bastoul,et al. Productivity via Automatic Code Generation for PGAS Platforms with the R-Stream Compiler , 2009 .
[11] Robert J. Fowler,et al. Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations , 2003, J. Parallel Distributed Comput..
[12] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[13] J. Ramanujam,et al. Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..
[14] Rudolf Eigenmann,et al. A hybrid approach of OpenMP for clusters , 2012, PPoPP '12.
[15] Armin Größlinger. Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes , 2009, CC.
[16] Uday Bondhugula,et al. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.
[17] Uday Bondhugula,et al. Combined iterative and model-driven optimization in an automatic parallelization framework , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[19] Monica S. Lam,et al. Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..
[20] Nectarios Koziris,et al. Message-passing code generation for non-rectangular tiling transformations , 2006, Parallel Comput..
[21] Martin Griebl,et al. Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .
[22] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[23] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[24] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[25] Monica S. Lam,et al. An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.
[26] Martin Griebl,et al. Automatic code generation for distributed memory architectures in the polytope model , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[27] David Parello,et al. Facilitating the search for compositions of program transformations , 2005, ICS '05.
[28] Uday Bondhugula,et al. Generating efficient data movement code for heterogeneous architectures with distributed-memory , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[29] Vikram S. Adve,et al. Using integer sets for data-parallel program analysis and optimization , 1998, PLDI.
[30] Ken Kennedy,et al. Advanced optimization strategies in the Rice dHPF compiler , 2002, Concurr. Comput. Pract. Exp..
[31] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[32] Monica S. Lam,et al. Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.