Efficient broadcasts and simple algorithms for parallel linear algebra computing in clusters

This paper presents a natural and efficient implementation for the classical broadcast message passing routine which optimizes performance of Ethernet based clusters. A simple algorithm for parallel matrix multiplication is specifically designed to take advantage of both, parallel computing facilities (CPUs) provided by clusters, and optimized performance of broadcast messages on Ethernet based clusters. Also, this simple parallel algorithm proposed for matrix multiplication takes into account the possibly heterogenous computing hardware and maintains a balanced workload of computers according to their relative computing power. Performance tests are presented on a heterogenous cluster as well as on a homogeneous cluster, where it is compared with the parallel matrix multiplication provided by the ScaLAPACK library. Another simple parallel algorithm is proposed for LU matrix factorization (a general method to solve dense systems of equations) following the same guidelines used for the parallel matrix multiplication algorithm. Some performance tests are presented over a homogenous cluster.

[1]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[2]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[3]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[4]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[5]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[6]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[7]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[8]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[9]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[10]  Yong Yan,et al.  Modeling and characterizing parallel computing performance on heterogeneous networks of workstations , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[11]  Jack Dongarra,et al.  Libraries for linear algebra , 1995 .

[12]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[13]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[14]  Lars Rzymianowicz High Speed Networks , 1999 .

[15]  Jack Dongarra,et al.  Integrated Pvm Framework Supports Heterogeneous Network Computing , 1993 .

[16]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .