Matri xProduc to nHeterogeneou sMaster-Worke rPlatforms

This paper is focused on designing efficient parallel matrix-product algorithms for heterogeneous master-worker platforms. While matrix-product is well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: - Centralized data. We assume that all matrix files originate from, and must be returned to, the master. The master distributes data and computations to the workers (while in ScaLAPACK, input and output matrices are supposed to be equally distributed among participating resources beforehand). Typically, our approach is useful in the context of speeding up MATLAB or SCILAB clients running on a server (which acts as the master and initial repository of files). - Heterogeneous star-shaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. Also, the workers are connected to the master by links of different capacities. This framework is realistic when deploying the application from the server, which is responsible for enrolling authorized resources. - Limited memory. As we investigate the parallelization of large problems, we cannot assume that full matrix column blocks can be stored in the worker memories and be re-used for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on a platform at our site. The experiments show that our matrix-product algorithm has smaller execution times than existing ones, while it also uses fewer resources.

[1]  Larry Carter,et al.  Scheduling strategies for master-slave tasking on heterogeneous processor platforms , 2004, IEEE Transactions on Parallel and Distributed Systems.

[2]  Manish Parashar,et al.  Understanding the Behavior and Performance of Non-blocking Communications in MPI , 2004, Euro-Par.

[3]  Dror Irony,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..

[4]  Jack J. Dongarra,et al.  Key Concepts for Parallel Out-of-Core LU Factorization , 1996, Parallel Comput..

[5]  Victor Y. Pan,et al.  Parallel Matrix Multiplication on a Linear Array with a Reconfigurable Pipelined Bus System , 2001, IEEE Trans. Computers.

[6]  Sivan Toledo,et al.  A survey of out-of-core algorithms in numerical linear algebra , 1999, External Memory Algorithms.

[7]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[8]  Yves Robert,et al.  Matrix Multiplication on Heterogeneous Platforms , 2001, IEEE Trans. Parallel Distributed Syst..

[9]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[10]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[11]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[12]  Viktor K. Prasanna,et al.  Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems , 2007, IEEE Transactions on Parallel and Distributed Systems.

[13]  Yves Robert,et al.  Revisiting Matrix Product on Master-Worker Platforms , 2006, 2007 IEEE International Parallel and Distributed Processing Symposium.

[14]  Alexey L. Lastovetsky,et al.  Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 2001, J. Parallel Distributed Comput..

[15]  Viktor K. Prasanna,et al.  Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..

[16]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[17]  R. F. Freund,et al.  Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).