Productive Programming of Distributed Systems with the SHAD C++ Library

High-performance computing (HPC) is often perceived as a matter of making large-scale systems (e.g., clusters) run as fast as possible, regardless the required programming effort. However, the idea of "bringing HPC to the masses" has recently emerged. Inspired by this vision, we have designed SHAD, the Scalable High-performance Algorithms and Data-structures library [1][6]. SHAD is open source software, written in C++, for C++ developers. Unlike other HPC libraries for distributed systems, which rely on SPMD models, SHAD adopts a shared-memory programming abstraction, to make C++ programmers feel at home. Underneath, SHAD manages tasking and data-movements, moving the computation where data resides and taking advantage of asynchrony to tolerate network latency. At the bottom of his stack, SHAD can interface with multiple runtime systems: this not only improves developer's productivity, by hiding the complexity of such software and of the underlying hardware, but also greatly enhance code portability. Thanks to its abstraction layers, SHAD can indeed target different systems, ranging from laptops to HPC clusters, without any need for modifying the user-level code.We have prototyped and open-sourced the implementation of (a subset of) the C++ standard library (STL) targeting multi-node HPC clusters. Our work allows plain STL-based C++ code to scale on HPC systems, with no need for rewriting the code to exploit the complex hardware. SHAD is available under Apache v2 License at https://github.com/pnnl/SHAD. In this paper we overview the design of the SHAD library, depicting its main components: runtime systems abstractions for tasking; parallel and distributed data-structures; STL-compliant interfaces and algorithms.

[1]  Practical Distributed Programming in C++ , 2020, HPDC.

[2]  Marco Minutoli,et al.  SHAD: The Scalable High-Performance Algorithms and Data-Structures Library , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).