A regression-based approach to scalability prediction

Many applied scientific domains are increasingly relying on large-scale parallel computation. Consequently, many large clusters now have thousands of processors. However, the ideal number of processors to use for these scientific applications varies with both the input variables and the machine under consideration, and predicting this processor count is rarely straightforward. Accurate prediction mechanisms would provide many benefits, including improving cluster efficiency and identifying system configuration or hardware issues that impede performance. We explore novel regression-based approaches to predict parallel program scalability. We use several program executions on a small subset of the processors to predict execution time on larger numbers of processors. We compare three different regression-based techniques: one based on execution time only; another that uses per-processor information only; and a third one based on the global critical path. These techniques provide accurate scaling predictions, with median prediction errors between 6.2% and 17.3% for seven applications.

[1]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[2]  Allen D. Malony,et al.  ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.

[3]  Jesús Labarta,et al.  A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[5]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[6]  Sadaf R. Alam,et al.  Hierarchical Model Validation of Symbolic Performance Models of Scientific Kernels , 2006, Euro-Par.

[7]  John M. Mellor-Crummey,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.

[8]  The International Journal of High Performance Computing Applications— , 1998 .

[9]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[10]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[11]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[12]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[13]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[14]  Raghu Kacker,et al.  A scalability test for parallel code , 1995, Softw. Pract. Exp..

[15]  Barton P. Miller,et al.  On-line automated performance diagnosis on thousands of processes , 2006, PPoPP '06.

[16]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[17]  Sally A. McKee,et al.  An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.

[18]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[19]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[20]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[21]  Graham F. Carey,et al.  A scalable, object-oriented finite element solver for partial differential equations on multicomputers , 1992, ICS '92.

[22]  Manish Madhukar,et al.  Performance modeling for SPMD message-passing programs , 1998 .

[23]  Matthias S. Müller,et al.  Developing Scalable Applications with Vampir, VampirServer and VampirTrace , 2007, PARCO.

[24]  Sadaf R. Alam,et al.  Performance characterization of molecular dynamics techniques for biomolecular simulations , 2006, PPoPP '06.

[25]  John M. Mellor-Crummey,et al.  Application Insight Through Performance Modeling , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[26]  Jürgen Brehm,et al.  Performance modeling for SPMD message-passing programs , 1998, Concurr. Pract. Exp..

[27]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[28]  Daniel A. Reed,et al.  SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[29]  Jeffrey K. Hollingsworth,et al.  Critical Path Profiling of Message Passing and Shared-Memory Programs , 1998, IEEE Trans. Parallel Distributed Syst..

[30]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[31]  Frank Mueller,et al.  Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[32]  M. Schulz,et al.  Extracting Critical Path Graphs from MPI Applications , 2005, 2005 IEEE International Conference on Cluster Computing.

[33]  Remzi H. Arpaci-Dusseau,et al.  Architectural Requirements and Scalability of the NAS Parallel Benchmarks , 1999, ACM/IEEE SC 1999 Conference (SC'99).