An approximate method for optimizing HPC component applications in the presence of multiple component implementations

The Common Component Architecture allows computational scientists to adopt a component-based architecture for scientific simulation codes. Components, which in the scientific context, usually embody a numerical solution facility or a physical or numerical model, are composed at runtime into a simulation code by loading in an implementation of a component and linking it to others. However, a component may admit multiple implementations, based on the choice of the algorithm, data structure, parallelization strategy, etc. posing the user with the problem of having to choose the “correct” implementation and achieve an optimal (fastest) component assembly. Under the assumption that a performance model exists for each implementation of each component, simply choosing the optimal implementation of each component does not guarantee an optimal component assembly since components interact with each other. An optimal solution may be obtained by evaluating the performance of all the possible realizations of a component assembly given the components and all their implementations, but the exponential complexity renders the approach unfeasible as the number of components and their implementations rise. We propose an approximate approach predicated on the existence, identification and optimization of computationally dominant sub-assemblies (cores). We propose a simple criterion to test for the existence of such cores and a set of rules to prune a component assembly and expose its dominant cores. We apply this approach to data obtained from a CCA component code simulating shock-induced turbulence on four processors and present preliminary results regarding the efficacy of this approach and the sensitivity of the final solution to various parameters in the rules.

[1]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[2]  H. Najm,et al.  A Study of the Effect of Higher Order Spatial Discretizations in SAMR (Structured Adaptive Mesh Refi , 2003 .

[3]  Habib N. Najm,et al.  Using the Common Component Architecture to design high performance scientific simulation codes , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  W. Marsden I and J , 2012 .

[5]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[6]  Allen D. Malony,et al.  Performance measurement and modeling of component applications in a high performance computing environment: a case study , 2004 .

[7]  H. Najm,et al.  High-order spatial discretizations and extended stability methods for reacting flows on structured adaptively refined meshes , 2022 .

[8]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[9]  James Arthur Kohl,et al.  The CCA core specification in a distributed memory SPMD framework , 2002, Concurr. Comput. Pract. Exp..

[10]  Jaideep Ray,et al.  A component-based scientific toolkit for reacting flows , 2003 .

[11]  Francine Berman,et al.  Scheduling from the perspective of the application , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[12]  Daniel A. Reed,et al.  The Autopilot Performance-Directed Adaptive Control System , 1997 .

[13]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[14]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.