Data partitioning for multiprocessors with memory heterogeneity and memory constraints

The paper presents a performance model that can be used to optimally distribute computations over heterogeneous computers. This model is application-centric representing the speed of each computer by a function of the problem size. This way it takes into account the processor heterogeneity, the heterogeneity of memory structure, and the memory limitations at each level of memory hierarchy. A problem of optimal partitioning of an n-element set over p heterogeneous processors using this performance model is formulated, and its efficient solution of the complexity O(p^{3}× log_{2} n) is given.

[1]  Mohammed J. Zaki,et al.  Compile-Time Scheduling Algorithms for a Heterogeneous Network of Workstations , 1997, Comput. J..

[2]  J. Ramanujam,et al.  Memory-Constrained Communication Minimization for a Class of Array Computations , 2002, LCPC.

[3]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[4]  Alexey L. Lastovetsky,et al.  Classification of Partitioning Problems for Networks of Heterogeneous Computers , 2003, PPAM.

[5]  Ming Wu,et al.  Memory conscious task partition and scheduling in grid environments , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[7]  Alexey Lastovetsky,et al.  Towards a Realistic Performance Model for Networks of Heterogeneous Computers , 2005 .

[8]  Ming Wu,et al.  Grid Harvest Service: a system for long-term, application-level task scheduling , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  Jorge G. Barbosa,et al.  Simulation of data distribution strategies for LU factorization on heterogeneous machines , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[11]  Baruch Awerbuch,et al.  An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster , 2000, IEEE Trans. Parallel Distributed Syst..

[12]  Alexey L. Lastovetsky,et al.  Data partitioning with a realistic performance model of networks of heterogeneous computers with task size limits , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[13]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[14]  Alexey L. Lastovetsky,et al.  Data partitioning with a realistic performance model of networks of heterogeneous computers , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[15]  Li Xiao,et al.  Dynamic Cluster Resource Allocations for Jobs with Known and Unknown Memory Demands , 2002, IEEE Trans. Parallel Distributed Syst..

[16]  Bharadwaj Veeravalli,et al.  Divisible load scheduling on single-level tree networks with buffer constraints , 2000, IEEE Trans. Aerosp. Electron. Syst..

[17]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.

[18]  Jorge G. Barbosa,et al.  Linear algebra algorithms in a heterogeneous cluster of personal computers , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[19]  Alexey Lastovetsky Parallel computing on heterogeneous networks , 2003 .

[20]  Baruch Awerbuch,et al.  An Opportunity Cost Approach for Job Assignment and Reassignment in a Scalable Computing Cluster , 2002 .

[21]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[22]  Pawel Wolniewicz,et al.  Divisible Load Scheduling in Systems with Limited Memory , 2004, Cluster Computing.