Reducing data access latency in SDSM systems using runtime optimizations

Software Distributed Shared Memory (SDSM) systems offer a convenient way to run applications developed for shared memory systems on distributed systems with no changes to them. However, since SDSM systems add an extra layer of abstraction to the memory hierarchy, applications may suffer performance problems when running on top of them. Our main research interest is to develop a set of compiler and runtime system techniques that widen the range of applications that can efficiently run on SDSM systems. Currently we are targeting OpenMP applications due to the ease of use this programming model provides. In this paper we show the performance of a set of regular applications that perform well on our SDSM system. They were adapted from OpenCL codes provided by ATI, and re-written in OpenMP. When trying to exploit more complex applications with different data access patterns, we find more difficulties from a DSM system. As an example, we show the performance evaluation of the NAS MG benchmark, and two techniques we have developed to improve its data locality. Our SDSM infrastructure is composed of NanosDSM, an everything-shared SDSM developed at the Technical University of Catalonia (UPC) and the Barcelona Supercomputing Center (BSC), and the IBM XL SMP Runtime to allow the execution of the OpenMP applications.

[1]  Weisong Shi,et al.  JIAJIA: A Software DSM System Based on a New Cache Coherence Protocol , 1999, HPCN Europe.

[2]  Jin-Soo Kim,et al.  ParADE: An OpenMP Programming Environment for SMP Cluster Systems , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  W. Hu,et al.  JIA-JIA : An SVM System Based on A New Cache Coherence Protocol , 1999 .

[4]  Eduard Ayguadé,et al.  A Library Implementation of the Nano-Threads Programming Model , 1996, Euro-Par, Vol. II.

[5]  Alan L. Cox,et al.  OpenMP for networks of SMPs , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[6]  Eduard Ayguadé,et al.  Nanos mercurium: A research compiler for OpenMP , 2004 .

[7]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[8]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[9]  Openmp: a Proposed Industry Standard Api for Shared Memory Programming , 2022 .

[10]  R. Bianchini,et al.  Adaptive Techniques for Home-Based Software DSMs , 2001 .

[11]  Mitsuhisa Sato,et al.  Design of OpenMP Compiler for an SMP Cluster , 1999 .

[12]  Matthias S. Müller,et al.  Experiences u:;ing OpenMP based on Compiler Directed S ftware DSM on a PC Cluster , 2022 .

[13]  Eduard Ayguadé,et al.  Running OpenMP applications efficiently on an everything-shared SDSM , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[14]  John K. Bennett,et al.  Brazos: a third generation DSM system , 1997 .