On the Use of Performance Models for Adaptive Algorithm Selection on Heterogeneous Clusters

Due to the increasing diversity and the continuous evolution of existing parallel systems, solving efficiently a target problem by using a single algorithm or writing efficient and portable programs is becoming a challenging task. In this paper, we present a generic framework that integrates performance models with adaptive techniques in order to design efficient parallel algorithms in heterogeneous computing environments. To illustrate our approach, we study the matrix multiplication problem, where we compare different parallel algorithms. Experiments demonstrate that accurate performance predictions obtained from analytical performance models allow us to select the most appropriate algorithm to use depending on the problem and the platform parameters.

[1]  Mitsuhisa Sato,et al.  Parallel implementation of Strassen's matrix multiplication algorithm for heterogeneous clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[3]  Jack Dongarra,et al.  Automatic optimisation of parallel linear algebra routines in systems with variable load , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[4]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[5]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[6]  Alexey L. Lastovetsky On Grid-based Matrix Partitioning for Heterogeneous Processors , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[7]  Kees Verstoep,et al.  Network performance-aware collective communication for clustered wide-area systems , 2001, Parallel Comput..

[8]  Allen D. Malony,et al.  Performance Modeling for Dynamic Algorithm Selection , 2003, International Conference on Computational Science.

[9]  V. Strassen Gaussian elimination is not optimal , 1969 .

[10]  Yves Robert,et al.  Matrix Multiplication on Heterogeneous Platforms , 2001, IEEE Trans. Parallel Distributed Syst..

[11]  Alexey L. Lastovetsky,et al.  Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 2001, J. Parallel Distributed Comput..

[12]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[13]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[14]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[15]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[16]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[17]  Viktor K. Prasanna,et al.  Adaptive matrix multiplication in heterogeneous environments , 2002, Ninth International Conference on Parallel and Distributed Systems, 2002. Proceedings..

[18]  Thomas Rauber,et al.  Adaptive Selection of Communication Methods to Optimize Collective MPI Operations , 2005, PARCO.

[19]  Lawrence Rauchwerger,et al.  An Adaptive Algorithm Selection Framework for Reduction Parallelization , 2006, IEEE Transactions on Parallel and Distributed Systems.

[20]  Viktor K. Prasanna,et al.  Adaptive Communication Algorithms for Distributed Heterogeneous Systems , 1999, J. Parallel Distributed Comput..

[21]  Jack Dongarra,et al.  Performance of LAPACK: a portable library of numerical linear algebra routines , 1992, Proc. IEEE.