A hybrid performance modeling approach for adaptive algorithm selection on hierarchical clusters

Recent advances in parallel and distributed computing have made it very challenging for programmers to reach the performance potential of current systems. In addition, recent advances in numerical algorithms and software optimizations have tremendously increased the number of alternatives for solving a problem, which further complicates the software tuning process. Indeed, no single algorithm can represent the universal best choice for efficient solution of a given problem on all compute substrates. In this paper, we develop a framework that addresses the design of efficient parallel algorithms in hierarchical computing environments. More specifically, given multiple choices for solving a particular problem, the framework uses a judicious combination of analytical performance models and empirical approaches to automate the algorithm selection by determining the most suitable execution scheme expected to perform the best at the specific setting. Preliminary experimental results obtained by implementing two different numerical kernels demonstrated the interest of the hybrid performance modeling approach integrated in the framework.

[1]  Sriram Krishnamoorthy,et al.  Combining analytical and empirical approaches in tuning matrix transposition , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Frédéric Vivien,et al.  A First Step Towards Automatically Building Network Representations , 2007, Euro-Par.

[3]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[4]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[5]  Dieter K. Hammer,et al.  Analysis and prediction of performance for evolving architectures , 2004 .

[6]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[7]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[8]  Keshav Pingali,et al.  Think globally, search locally , 2005, ICS '05.

[9]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[10]  Gianluca Bontempi,et al.  A data analysis method for software performance prediction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Dieter K. Hammer,et al.  Analysis and prediction of performance for evolving architectures , 2004, Proceedings. 30th Euromicro Conference, 2004..

[12]  Lawrence Rauchwerger,et al.  An Adaptive Algorithm Selection Framework for Reduction Parallelization , 2006, IEEE Transactions on Parallel and Distributed Systems.

[13]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Tomás F. Pena,et al.  Analytical Performance Models of Parallel Programs in Clusters , 2007, PARCO.

[15]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[16]  Sally A. McKee,et al.  An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.

[17]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[18]  Javier Cuenca,et al.  Including Improvement of the Execution Time in a Software Architecture of Libraries With Self-Optimisation , 2007, ICSOFT.

[19]  Henri E. Bal,et al.  TOPOMON: A Monitoring Tool for Grid Network Topology , 2002, International Conference on Computational Science.

[20]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[21]  Kees Verstoep,et al.  Network performance-aware collective communication for clustered wide-area systems , 2001, Parallel Comput..

[22]  Denis Trystram,et al.  Adaptive approaches for efficient parallel algorithms on cluster-based systems , 2009, Int. J. Grid Util. Comput..

[23]  Laura Carrington,et al.  A Framework for Application Performance Modeling and Prediction , 2002 .

[24]  Allen D. Malony,et al.  Performance Modeling for Dynamic Algorithm Selection , 2003, International Conference on Computational Science.

[25]  Martin Quinson,et al.  An Application-Level Network Mapper , 2003 .

[26]  Joseph E. Flaherty,et al.  Resource-aware scientific computation on a heterogeneous cluster , 2005, Computing in Science & Engineering.

[27]  Urs Niesen,et al.  Adaptive Alternating Minimization Algorithms , 2007, ISIT.

[28]  Bruce Lowekamp,et al.  ECO: Efficient Collective Operations for communication on heterogeneous networks , 1996, Proceedings of International Conference on Parallel Processing.

[29]  Dean Sutherland,et al.  The architecture of the Remos system , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[30]  Jack Dongarra,et al.  Automatic optimisation of parallel linear algebra routines in systems with variable load , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..