Communication-Aware Supernode Shape
暂无分享,去创建一个
Nectarios Koziris | Georgios I. Goumas | Nikolaos Drosinos | N. Koziris | G. Goumas | Nikolaos Drosinos
[1] Robert Michael Kirby,et al. Parallel Scientific Computing in C++ and MPI - A Seamless Approach to Parallel Algorithms and their Implementation , 2003 .
[2] Weijia Shang,et al. On Supernode Transformation with Minimized Total Running Time , 1998, IEEE Trans. Parallel Distributed Syst..
[3] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[4] Nectarios Koziris,et al. Performance comparison of pure MPI vs hybrid MPI-OpenMP parallelization models on SMP clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[5] Berardino D'Acunto. Computational Methods for PDE in Mechanics - (With CD-ROM) , 2004, Series on Advances in Mathematics for Applied Sciences.
[6] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[7] Peiyi Tang,et al. Reducing data communication overhead for DOACROSS loop nests , 1994, ICS '94.
[8] Larry Carter,et al. On the Parallel Execution Time of Tiled Loops , 2003, IEEE Trans. Parallel Distributed Syst..
[9] Jingling Xue,et al. On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..
[10] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[11] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.
[12] Monica S. Lam,et al. A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..
[13] Rumen Andonov,et al. First Steps Towards Optimal Oblique Tile Sizing , 2007 .
[14] Sanjay V. Rajopadhye,et al. Optimal Semi-Oblique Tiling , 2003, IEEE Trans. Parallel Distributed Syst..
[15] Mahmut Kandemir,et al. A Unified Tiling Approach for Out-Of-Core Computations , 1996 .
[16] N. E. Hoskin. The solution of partial differential equations , 1989 .
[17] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[18] Nectarios Koziris,et al. Minimizing completion time for loop tiling with computation and communication overlapping , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[19] Erik H. D'Hollander,et al. Partitioning and Labeling of Loops by Unimodular Transformations , 1992, IEEE Trans. Parallel Distributed Syst..
[20] Yves Robert,et al. (Pen)-ultimate tiling? , 1994, Integr..
[21] Saeed Parsa,et al. A New Genetic Algorithm for Loop Tiling , 2006, The Journal of Supercomputing.
[22] Wentong Cai,et al. Time-minimal tiling when rise is larger than zero , 2002, Parallel Comput..
[23] Jingling Xue,et al. Communication-Minimal Tiling of Uniform Dependence Loops , 1996, J. Parallel Distributed Comput..
[24] Zhiyuan Li,et al. IMPACT OF TILE-SIZE SELECTION FOR SKEWED TILING , 2001 .
[25] W. Shang,et al. On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays , 1992, IEEE Trans. Parallel Distributed Syst..
[26] Peiyi Tang,et al. Generating efficient tiled code for distributed memory machines , 2000, Parallel Comput..
[27] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .
[28] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[29] Nectarios Koziris,et al. An Efficient Code Generation Technique for Tiled Iteration Spaces , 2003, IEEE Trans. Parallel Distributed Syst..
[30] Yves Robert,et al. Static tiling for heterogeneous computing platforms , 1999, Parallel Comput..
[31] Weijia Shang,et al. Time Optimal Linear Schedules for Algorithms with Uniform Dependencies , 1991, IEEE Trans. Computers.
[32] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[33] Yves Robert,et al. Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..
[34] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[35] Nectarios Koziris,et al. A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping , 2003, J. Parallel Distributed Comput..
[36] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.
[37] Berardino D'Acunto. Computational Methods For PDE In Mechanics , 2004 .
[38] José A. B. Fortes,et al. Time optimal linear schedules for algorithms with uniform dependencies , 1988, [1988] Proceedings. International Conference on Systolic Arrays.
[39] Mahmut T. Kandemir,et al. A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations , 2000, IEEE Trans. Parallel Distributed Syst..
[40] Nectarios Koziris,et al. Message-passing code generation for non-rectangular tiling transformations , 2006, Parallel Comput..
[41] Weijia Shang,et al. On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..