Load Partitioning and Trade-Off Study for Large Matrix-Vector Computations in Multicast Bus Networks with Communication Delays

In this paper we consider the problem of computing a large matrix-vector product in a network-based distributed computing environment comprising computers equipped with communication co-processors that may be used for communication off-loading. Communication delays, which are significant in such systems, are specifically taken into account. The important contribution of this study is to show that the optimal load partitioning, and the subsequent performance of the network, depends critically on many network parameters and load characteristics. In particular, it is shown that the size of the load plays an important role in determining the performance of the network. We consider only row-wise striping of the matrix in order to better allocate the computational burden among the processors. We derive closed-form solutions to the optimal load partitioning problem and show the existence of optimal load sharing conditions. An important and practically relevant trade-off study, from the architecture point of view, between the number of processors and the bus bandwidth is presented. Several practical load distribution strategies are considered and complete analyses for each of them is presented

[1]  Debasish Ghose,et al.  Optimal Sequencing and Arrangement in Distributed Single-Level Tree Networks with Communication Delays , 1994, IEEE Trans. Parallel Distributed Syst..

[2]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[3]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[4]  Debasish Ghose,et al.  Multi-installment load distribution in tree networks with delays , 1995 .

[5]  Thomas G. Robertazzi,et al.  Distributed computation for a tree network with communication delays , 1990 .

[6]  Debasish Ghose,et al.  Distributed Computation with Communication Delays: Asymptotic Performance Analysis , 1994, J. Parallel Distributed Comput..

[7]  Mounir Hamdi,et al.  Parallel Image Processing Applications on a Network of Workstations , 1995, Parallel Comput..

[8]  Stelios C. Orphanoudakis,et al.  Load Balancing Requirements in Parallel Implementations of Image Feature Extraction Tasks , 1993, IEEE Trans. Parallel Distributed Syst..

[9]  Thomas G. Robertazzi,et al.  Distributed computation with communication delay (distributed intelligent sensor networks) , 1988 .

[10]  Gregory F. Pfister Clusters of computers for commercial processing: the invisible architecture , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[11]  Lorenzo Alvisi,et al.  Parallel Computing in Networks of Workstations with Paralex , 1996, IEEE Trans. Parallel Distributed Syst..

[12]  Peter Steenkiste,et al.  Network-Based Multicomputers: A Practical Supercomputer Architecture , 1996, IEEE Trans. Parallel Distributed Syst..

[13]  Marian Bubak,et al.  Transmission Rates and Performance of a Network of Computers , 1994, HPCN.

[14]  G. N. Srinivasa Prasanna,et al.  Generalized Multiprocessor Scheduling and Applications to Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..

[15]  Alok N. Choudhary,et al.  Implementation and Evaluation of Hough Transform Algorithms on a Shared-Memory Multiprocessor , 1991, J. Parallel Distributed Comput..

[16]  Hyoung Joong Kim,et al.  Optimal load distribution for tree network processors , 1996 .

[17]  T.G. Robertazzi,et al.  Optimal divisible job load sharing for bus networks , 1996, IEEE Transactions on Aerospace and Electronic Systems.

[18]  Thomas G. Robertazzi,et al.  Bus-oriented load sharing for a network of sensor driven processors , 1991, IEEE Trans. Syst. Man Cybern..

[19]  Denis Trystram,et al.  Parallel Matrix-Vector Product on Rings with a Minimum of Communications , 1996, Parallel Comput..

[20]  Frank D. Anger,et al.  Scheduling with Sufficient Loosely Coupled Processors , 1990, J. Parallel Distributed Comput..

[21]  Thomas G. Robertazzi,et al.  Closed Form Solutions for Bus and Tree Networks of Processors Load Sharing A Divisible Job , 1993, ICPP.

[22]  Thomas G. Robertazzi,et al.  Closed Form Solutions for Bus and Tree Networks of Processors Load Sharing A Divisible Job , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[23]  Graham F. Carey,et al.  Maximizing Sparse Matrix--Vector Product Performance on RISC Based MIMD Computers , 1996, J. Parallel Distributed Comput..

[24]  Debasish Ghose,et al.  Distributed computation in linear networks: closed-form solutions , 1994 .

[25]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[26]  Debasish Ghose,et al.  An efficient load distribution strategy for a distributed linear network of processors with communication delays , 1995 .

[27]  Benjamin Charny Matrix Partitioning on a Virtual Shared Memory Parallel Machine , 1996, IEEE Trans. Parallel Distributed Syst..