论文信息 - Static Data Allocation and Load Balancing Techniques for Heterogeneous Systems - 字舞流文

Static Data Allocation and Load Balancing Techniques for Heterogeneous Systems

Yves Robert | Arnaud Legrand | Fabrice Rastello | Olivier Beaumont | Vincent Boudet | Olivier Beaumont | Arnaud Legrand | Y. Robert | Vincent Boudet | F. Rastello

[1] Michael Werman,et al. The decomposition of a square into rectangles of minimal perimeter , 1987, Discret. Appl. Math..

[2] Michael J. Quinn,et al. Block data decomposition for data-parallel programming on a heterogeneous workstation network , 1993, [1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing.

[3] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4] D. West. Introduction to Graph Theory , 1995 .

[5] Alexey L. Lastovetsky,et al. Heterogeneous Distribution of Computations While Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 1999, HPCN Europe.

[6] Jaswinder Pal Singh,et al. The effects of communication parameters on end performance of shared virtual memory clusters , 1997, SC '97.

[7] Srinivasan Parthasarathy,et al. Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[8] Andrew S. Grimshaw,et al. The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[9] Galen C. Hunt,et al. Vm-based Shared Memory On Low-latency, Remote-memory-access Networks , 1996, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10] Ricardo Bianchini,et al. Lazy release consistency for hardware-coherent multiprocessors , 1995 .

[11] Hochang Lee,et al. A Hybrid Bounding Procedure for the Workload Allocation Problem on Parallel Unrelated Machines with Setups , 1996 .

[12] Paul Hudak,et al. Memory coherence in shared virtual memory systems , 1989, TOCS.

[13] Liviu Iftode,et al. Improving release-consistent shared virtual memory using automatic update , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[14] James R. Larus,et al. Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations , 1996 .

[15] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[16] Chelliah Sriskandarajah,et al. Parallel machine scheduling with a common server , 2000, Discret. Appl. Math..

[17] Liviu Iftode,et al. Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems , 1996, OSDI '96.

[18] Angelos Bilas,et al. Shared Virtual Memory Clusters with Next-Generation Interconnection Networks and Wide Compute Nodes , 2001, HiPC.

[19] Larry Carter,et al. Determining the idle time of a tiling , 1997, POPL '97.

[20] Prosenjit Bose,et al. Cutting rectangles in equal area pieces , 1998, CCCG.

[21] Jason Maassen,et al. Parallel Computing on Wide-Area Clusters: the Albatross Project, , 1999 .

[22] Yuanyuan Zhou,et al. Limits to the performance of software shared memory: a layered approach , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[23] Greg J. Regnier,et al. The Virtual Interface Architecture , 2002, IEEE Micro.

[24] Hiroshi Ohta,et al. Optimal tile size adjustment in compiling general DOACROSS loop nests , 1995, ICS '95.

[25] Kai Li,et al. Understanding Application Performance on Shared Virtual Memory Systems , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[26] Donald Yeung,et al. Multigrain shared memory , 2000, TOCS.

[27] Ian T. Foster,et al. Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[28] Arnold L. Rosenberg,et al. Sharing partitionable workloads in heterogeneous NOWs: greedier is not better , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[29] Yves Robert,et al. A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[30] Jaswinder Pal Singh,et al. Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors , 1997, PPOPP '97.

[31] Kunle Olukotun,et al. The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[32] Henri Casanova,et al. NetSovle: A Network Server for Solving Computational Science Problems , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[33] Kai Li,et al. Design and implementation of virtual memory-mapped communication on Myrinet , 1997, Proceedings 11th International Parallel Processing Symposium.

[34] Cezary Dubnicki,et al. VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .

[35] Michael L. Scott,et al. Using memory-mapped network interfaces to improve the performance of distributed shared memory , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[36] Kai Li,et al. Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[37] Geoffrey C. Fox,et al. Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[38] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[39] Brian N. Bershad,et al. Fast Interrupt Priority Management in Operating System Kernels , 1993, USENIX Microkernels and Other Kernel Architectures Symposium.

[40] A. W. Roscoe,et al. The Decomposition of a Rectangle into Rectangles of Minimal Perimeter , 1988, SIAM J. Comput..

[41] Willy Zwaenepoel,et al. Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[42] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .

[43] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[44] Stephen Taylor,et al. A Practical Approach to Dynamic Load Balancing , 1998, IEEE Trans. Parallel Distributed Syst..

[45] Peter Brucker,et al. Complexity results for parallel machine problems with a single server , 2002 .

[46] Thorsten von Eicken,et al. U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[47] Svetlana A. Kravchenko,et al. Parallel machine scheduling problems with a single server , 1997 .

[48] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[49] Kevin Skadron,et al. Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[50] Michel Minoux,et al. Graphs and Algorithms , 1984 .

[51] H. T. Kung,et al. Path Planning On The Warp Computer: Using A Linear Systolic Array In Dynamic Programming , 1988, Optics & Photonics.

[52] Alan L. Cox,et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[53] Ramesh C. Agarwal,et al. A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..

[54] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[55] Jack Dongarra,et al. PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[56] John L. Hennessy,et al. SoftFLASH: analyzing the performance of clustered distributed virtual shared memory , 1996, ASPLOS VII.

[57] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[58] Ricardo Bianchini,et al. Hiding communication latency and coherence overhead in software DSMs , 1996, ASPLOS VII.

[59] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .

[60] Yves Robert,et al. Matrix Multiplication on Heterogeneous Platforms , 2001, IEEE Trans. Parallel Distributed Syst..

[61] Liviu Iftode,et al. Relaxed consistency and coherence granularity in DSM systems: a performance evaluation , 1997, PPOPP '97.

[62] Per Stenström,et al. Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[63] Guy Lemieux,et al. The NUMAchine multiprocessor , 2000, Proceedings 2000 International Conference on Parallel Processing.