KART - A Runtime Compilation Library for Improving HPC Application Performance

The effectiveness of ahead-of-time compiler optimization heavily depends on the amount of available information at compile time. Input-specific information that is only available at runtime cannot be used, although it often determines loop counts, branching predicates and paths, as well as memory-access patterns. It can also be crucial for generating efficient SIMD-vectorized code. This is especially relevant for the many-core architectures paving the way to exascale computing, which are more sensitive to code-optimization. We explore the design-space for using input-specific information at compile-time and present KART, a Open image in new window library solution that allows developers to compile, link, and execute code (e.g., C, Open image in new window , Fortran) at application runtime. Besides mere runtime compilation of performance-critical code, KART can be used to instantiate the same code multiple times using different inputs, compilers, and options. Other techniques like auto-tuning and code-generation can be integrated into a KART-enabled application instead of being scripted around it. We evaluate runtimes and compilation costs for different synthetic kernels, and show the effectiveness for two real-world applications, HEOM and a WSM6 proxy.

[1]  G. Henry,et al.  LIBXSMM: A High Performance Library for Small Matrix Multiplications , 2015 .

[2]  Bálint Joó,et al.  A Framework for Lattice QCD Calculations on GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[3]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[4]  Thomas Steinke,et al.  A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[6]  Tom Henderson,et al.  Numerical Weather Prediction Optimization , 2015 .

[7]  Alán Aspuru-Guzik,et al.  Scalable High-Performance Algorithm for the Simulation of Exciton Dynamics. Application to the Light-Harvesting Complex II in the Presence of Resonant Vibrational Modes. , 2014, Journal of chemical theory and computation.

[8]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[9]  Torsten Hoefler,et al.  MPI datatype processing using runtime compilation , 2013, EuroMPI.

[10]  Michael Klemm,et al.  Extending a Highly Parallel Data Mining Algorithm to the Intel ® Many Integrated Core Architecture , 2011, Euro-Par Workshops.

[11]  Alexander Heinecke,et al.  LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Boris Schling The Boost C++ Libraries , 2011 .

[13]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[14]  Matthias Noack,et al.  OpenCL: There and Back Again , 2015 .