Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
暂无分享,去创建一个
Nectarios Koziris | Panayiotis Tsanakas | Maria Athanasaki | Aristidis Sotiropoulos | Georgios Tsoukalas
[1] Yves Robert,et al. Determining the idle time of a tiling: new results , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[2] Hamid R. Arabnia,et al. Parallel Computer Vision on a Reconfigurable Multiprocessor Network , 1997, IEEE Trans. Parallel Distributed Syst..
[3] Nectarios Koziris,et al. Minimizing completion time for loop tiling with computation and communication overlapping , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[4] Richard P. Martin,et al. Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[5] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[6] Knut Omang,et al. VIA over SCI - consequences of a zero copy implementation, and comparison with VIA over myrinet , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[7] Weijia Shang,et al. On Time Optimal Supernode Shape , 2002, IEEE Trans. Parallel Distributed Syst..
[8] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..
[9] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.
[10] Matthias A. Blumrich. Network interface for protected, user-level communication , 1996 .
[11] Weijia Shang,et al. On supernode transformation with minimized total running time , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.
[12] J.P. Singh,et al. Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[13] Tarek S. Abdelrahman,et al. Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors , 2001, IEEE Trans. Parallel Distributed Syst..
[14] Chung-Ta King,et al. Pipelined Data Parallel Algorithms-II: Design , 1990, IEEE Trans. Parallel Distributed Syst..
[15] Sanjay V. Rajopadhye,et al. A Geometric Programming Framework for Optimal Multi-Level Tiling , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[16] Jingling Xue. Communication-Minimal Tiling of Uniform Dependence Loops , 1997, J. Parallel Distributed Comput..
[17] Yves Robert,et al. (Pen)-ultimate tiling? , 1994, Integr..
[18] Angelos Bilas,et al. User-Space Communication: A Quantitative Study , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[19] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[20] Donald J. Patterson,et al. Computer organization and design: the hardware-software interface (appendix a , 1993 .
[21] Mahmut T. Kandemir,et al. Improving Cache Locality by a Combination of Loop and Data Transformation , 1999, IEEE Trans. Computers.
[22] Larry Carter,et al. Determining the idle time of a tiling , 1997, POPL '97.
[23] Hamid R. Arabnia,et al. Parallel stereocorrelation on a reconfigurable multi-ring network , 1996, The Journal of Supercomputing.
[24] Hiroshi Tezuka,et al. Pin-down cache: a virtual memory management technique for zero-copy communication , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[25] Nectarios Koziris,et al. An Efficient Code Generation Technique for Tiled Iteration Spaces , 2003, IEEE Trans. Parallel Distributed Syst..
[26] Rajeev Barua,et al. The sensitivity of communication mechanisms to bandwidth and latency , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[27] David A. Patterson,et al. Computer Organization & Design: The Hardware/Software Interface , 1993 .
[28] T. KingC.,et al. Pipelined Data Parallel Algorithms-I , 1990 .
[29] P. Wyckoff,et al. EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[30] Yutaka Ishikawa,et al. MPICH-PM: Design and Implementation of Zero Copy MPI for PM , 1998 .
[31] Hermann Hellwagner. The SCI Standard and Applications of SCI , 1999, Scalable Coherent Interface.
[32] Nectarios Koziris,et al. Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes , 2004, 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2004. Proceedings..
[33] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[34] Nectarios Koziris,et al. A pipelined execution of tiled nested loops on SMPs with computation and communication overlapping , 2002, Proceedings. International Conference on Parallel Processing Workshop.
[35] Jang-Ping Sheu,et al. Partitioning and mapping of nested loops for linear array multicomputers , 1995, The Journal of Supercomputing.
[36] Larry Carter,et al. Selecting tile shape for minimal execution time , 1999, SPAA '99.
[37] Jang-Ping Sheu,et al. Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..
[38] Jingling Xue,et al. On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..
[39] Hermann Hellwagner,et al. SCI: Scalable Coherent Interface: Architecture and Software for High-Performance Compute Clusters , 1999 .
[40] Nectarios Koziris,et al. Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs Using Memory Mapped Network Interfaces , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[41] Nectarios Koziris,et al. Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays , 2000, IEEE Trans. Parallel Distributed Syst..
[42] Caliper Corp. Virtual interface architecture specification , 1997 .
[43] Nectarios Koziris,et al. Enhancing the performance of tiled loop execution onto clusters using memory mapped network interfaces and pipelined schedules , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[44] J. Ramanujam,et al. Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..
[45] Nectarios Koziris,et al. Evaluation of loop grouping methods based on orthogonal projection spaces , 2000, Proceedings 2000 International Conference on Parallel Processing.
[46] Nectarios Koziris,et al. Optimal Scheduling for UET/UET-UCT Generalized n-Dimensional Grid Task Graphs , 1999, J. Parallel Distributed Comput..
[47] Larry Carter,et al. On the Parallel Execution Time of Tiled Loops , 2003, IEEE Trans. Parallel Distributed Syst..
[48] Thorsten von Eicken,et al. U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.
[49] Richard P. Martin,et al. Modeling communication pipeline latency , 1998, SIGMETRICS '98/PERFORMANCE '98.
[50] Andrew A. Chien,et al. Software overhead in messaging layers: where does the time go? , 1994, ASPLOS VI.