Fast address translation techniques for distributed shared memory compilers

The distributed shared memory (DSM) model is designed to leverage the ease of programming of the shared memory paradigm, while enabling the high-performance by expressing locality as in the message-passing model. Experience, however, has shown that DSM programming languages, such as UPC, may be unable to deliver the expected high level of performance. Initial investigations have shown that among the major reasons is the overhead of translating from the UPC memory model to the target architecture virtual addresses space, which can be very costly. Experimental measurements have shown this overhead increasing execution time by up to three orders of magnitude. Previous work has also shown that some of this overhead can be avoided by hand-tuning, which on the other hand can significantly decrease the UPC ease of use. In addition, such tuning can only improve the performance of local shared accesses but not remote shared accesses. Therefore, a new technique that resembles the translation look aside buffers (TLBs) is proposed here. This technique, which is called the memory model translation buffer (MMTB) has been implemented in the GCC-UPC compiler using two alternative strategies, full-table (FT) and reduced-table (RT). It would be shown that the MMTB strategies can lead to a performance boost of up to 700%, enabling ease-of-programming while performing at a similar performance to hand-tuned UPC and MPI codes.

[1]  Jesse M. Draper,et al.  Distributed data access in AC , 1995, PPOPP '95.

[2]  Tarek A. El-Ghazawi,et al.  UPC benchmarking issues , 2001, International Conference on Parallel Processing, 2001..

[3]  Tarek A. El-Ghazawi,et al.  UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[5]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[6]  Tarek A. El-Ghazawi,et al.  Performance monitoring and evaluation of a UPC implementation on a NUMA architecture , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[7]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[8]  Katherine Yelick,et al.  UPC Language Specifications V1.1.1 , 2003 .