Scaling Properties of Parallel Applications to Exascale

A detailed profile of exascale applications helps to understand the computation, communication and memory requirements for exascale systems and provides the insight necessary for fine-tuning the computing architecture. Obtaining such a profile is challenging as exascale systems will process unprecedented amounts of data. Profiling applications at the target scale would require the exascale machine itself. In this work we propose a methodology to extrapolate the exascale profile from experimental observations over datasets feasible for today’s machines. Extrapolation models are carefully selected by means of statistical techniques and a high-level complexity analysis is included in the selection process to speed up the learning phase and to improve the accuracy of the final model. We extrapolate run-time properties of the target applications including information about the instruction mix, memory access pattern, instruction-level parallelism, and communication requirements. Compared to state-of-the-art techniques, the proposed methodology reduces the prediction error by an order of magnitude on the instruction count and improves the accuracy by up to 1.3$$\times $$× for the memory access pattern, and by more than 2$$\times $$× for the communication requirements.

[1]  John M. Mellor-Crummey,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.

[2]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[3]  Ilya Gluhovsky Determining output uncertainty of computer system models , 2007, Perform. Evaluation.

[4]  Stijn Eyerman,et al.  Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance , 2014, ACM Trans. Archit. Code Optim..

[5]  A. Falcão,et al.  Linear regression for calibration lines revisited: weighting schemes for bioanalytical methods. , 2002, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Kevin Skadron,et al.  Predictive design space exploration using genetically programmed response surfaces , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[8]  David Vengerov,et al.  Comprehensive multivariate extrapolation modeling of multiprocessor cache miss rates , 2007, TOCS.

[9]  Peter C. Jurs,et al.  Mathematica , 2019, J. Chem. Inf. Comput. Sci..

[10]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[11]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  Gero Dittmann,et al.  Quantifying Communication in Graph Analytics , 2015, ISC.

[13]  Torsten Hoefler,et al.  PEMOGEN: Automatic adaptive performance modeling during program runtime , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[14]  Zhihong Zhang,et al.  Comparison about the Three Central Composite Designs with Simulation , 2009, 2009 International Conference on Advanced Computer Control.

[15]  Christoph Hagleitner,et al.  An energy-efficient custom architecture for the SKA1-low central signal processor , 2015, Conf. Computing Frontiers.

[16]  A. Razborov Communication Complexity , 2011 .

[17]  Fabio Checconi,et al.  Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Gero Dittmann,et al.  Analytic processor model for fast design-space exploration , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[19]  Henri E. Bal,et al.  The Square Kilometre Array Science Data Processor. Preliminary compute platform design , 2015 .

[20]  Bin Li,et al.  Accurate and efficient processor performance prediction via regression tree based modeling , 2009, J. Syst. Archit..

[21]  Mellor-CrummeyJohn,et al.  Cross-architecture performance predictions for scientific applications using parameterized models , 2004 .

[22]  Michael Sipser,et al.  Introduction to the Theory of Computation , 1996, SIGA.

[23]  Henk Corporaal,et al.  An End-to-End Computing Model for the Square Kilometre Array , 2014, Computer.

[24]  C. Papadimitriou,et al.  Introduction to the Theory of Computation , 2018 .

[25]  Gero Dittmann,et al.  An Instrumentation Approach for Hardware-Agnostic Software Characterization , 2015, International Journal of Parallel Programming.

[26]  Koji Ueno,et al.  Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.

[27]  Emilio Luque,et al.  Parallel Application Signature for Performance Analysis and Prediction , 2015, IEEE Transactions on Parallel and Distributed Systems.

[28]  Tilak Agerwala Exascale computing: The challenges and opportunities in the next decade , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[29]  Tim J. Cornwell,et al.  The Noncoplanar Baselines Effect in Radio Interferometry: The W-Projection Algorithm , 2008, IEEE Journal of Selected Topics in Signal Processing.

[30]  Tianshi Chen,et al.  Microarchitectural design space exploration made fast , 2013, Microprocess. Microsystems.

[31]  James E. Smith,et al.  Advanced Micro Devices , 2005 .

[32]  Torsten Hoefler,et al.  Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Vittorio Zaccaria,et al.  OSCAR: An Optimization Methodology Exploiting Spatial Correlation in Multicore Design Spaces , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[34]  Ezra Gayawan,et al.  A Comparison of Akaike, Schwarz and R Square Criteria for Model Selection Using Some Fertility Models , 2009 .

[35]  Kevin Leyton-Brown,et al.  Algorithm runtime prediction: Methods & evaluation , 2012, Artif. Intell..

[36]  Gero Dittmann,et al.  Scaling application properties to exascale , 2015, Conf. Computing Frontiers.