Layer Based Partition for Matrix Multiplication on Heterogeneous Mesh Networks

While many approaches have been proposed to analyze the problem of matrix multiplication parallel computing, few of them address the problem on heterogeneous networks. It still remains an open question on heterogeneous networks to find the optimal schedule that balances the load within the heterogeneous processor set while minimizing the communication volume. A great many studies are based on rectangular partition, whereas the optimality of rectangular partition as the basis has not been well justified. In this paper, we propose an alternative approach called layer based partition (LBP), which jointly optimizes the total communication volume and task completion time. We also take network topology into account, by applying LBP on mesh networks. Simulation shows LBP reduces the total communication volume by 81%, while balancing load among all heterogeneous processors in mesh networks.

[1]  Richard G. Lyons,et al.  Understanding Digital Signal Processing , 1996 .

[2]  Alexey L. Lastovetsky On Grid-based Matrix Partitioning for Heterogeneous Processors , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[3]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[4]  Alexey L. Lastovetsky,et al.  Topology-Aware Optimization of Communications for Parallel Matrix Multiplication on Hierarchical Heterogeneous HPC Platform , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[5]  Mitsuhisa Sato,et al.  Parallel implementation of Strassen's matrix multiplication algorithm for heterogeneous clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Hiroshi Nagamochi,et al.  An approximation algorithm for dissecting a rectangle into rectangles with specified areas , 2003, Discret. Appl. Math..

[7]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[8]  James Demmel,et al.  Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[9]  J. Demmel,et al.  Bounds for Heterogeneous Architectures , 2011 .

[10]  Dong Li,et al.  Improving performance and energy efficiency of matrix multiplication via pipeline broadcast , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[11]  Olivier Beaumont,et al.  A New Approximation Algorithm for Matrix Partitioning in Presence of Strongly Heterogeneous Processors , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[12]  James Demmel,et al.  Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.

[13]  Alexey L. Lastovetsky,et al.  Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multicore Processors , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[14]  Alexey Kalinov Scalability analysis of matrix-matrix multiplication on heterogeneous clusters , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[15]  Gengsheng Lawrence Zeng,et al.  Medical Image Reconstruction: A Conceptual Tutorial , 2010 .

[16]  Zbigniew Lonc,et al.  Exact and approximation algorithms for a soft rectangle packing problem , 2014 .

[17]  Olivier Beaumont,et al.  Comparison of Static and Runtime Resource Allocation Strategies for Matrix Multiplication , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[18]  Alexey L. Lastovetsky,et al.  Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms , 2013, 2013 42nd International Conference on Parallel Processing.

[19]  Yves Robert,et al.  Matrix Multiplication on Heterogeneous Platforms , 2001, IEEE Trans. Parallel Distributed Syst..