CableS : thread control and memory management extensions for shared virtual memory clusters

Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although clusters have many advantages, their applicability in emerging areas of applications has been limited. One of the main reasons for this is the fact that clusters do not provide a single system image and thus are hard to program. In this work we address this problem by providing a single-cluster image with respect to thread and memory management. We implement our system, CableS (Cluster enabled threads), on a 32-processor cluster interconnected with a low-latency, high-bandwidth system area network and conduct an early exploration of the costs involved in providing the extra functionality. We demonstrate the versatility :of Cables with a wide range of applications and show that clusters can be used to support applications that have been written for more expensive tightly-coupled systems, With very little effort on the programmer side: (a) We run legacy pthreads applications without any major modifications. (b) We use a public domain OpenMP compiler (OdinMP) to translate OpenMP programs to pthreads and execute them on our system, with no or few modifications to the translated pthreads source code. (c) We provide an implementation of the M4 macros for our pthreads system and run the SPLASH-2 applications. We also show that the overhead introduced by the extra functionality of CableS affects the parallel section of applications that have been tuned for the shared memory abstraction only in cases where the data placement is affected by operating system (WindowsNT) limitations in virtual memory mappings granularity.

[1]  Srinivasan Parthasarathy,et al.  Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[2]  Willy Zwaenepoel,et al.  OpenMP on Networks of Workstations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  Liviu Iftode,et al.  Improving release-consistent shared virtual memory using automatic update , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[4]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[5]  Greg J. Regnier,et al.  The Virtual Interface Architecture , 2002, IEEE Micro.

[6]  Assaf Schuster,et al.  A high performance cluster JVM presenting a pure single system image , 2000, JAVA '00.

[7]  Assaf Schuster,et al.  MultiView and Millipage — fine-grain sharing in page-based DSMs , 1999, OSDI '99.

[8]  Frank Mueller,et al.  A Library Implementation of POSIX Threads under UNIX , 1993, USENIX Winter.

[9]  J.P. Singh,et al.  Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[10]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[11]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[12]  Ron Olsson,et al.  Run-time systems for parallel programming , 1999 .

[13]  Kai Li,et al.  UTLB: a mechanism for address translation on network interfaces , 1998, ASPLOS VIII.

[14]  Jaswinder Pal Singh,et al.  Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors , 1997, PPOPP '97.

[15]  Cezary Dubnicki,et al.  VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .

[16]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[17]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[18]  Thomas R. Gross,et al.  Transparent adaptive parallelism on NOWs using OpenMP , 1999, PPoPP '99.

[19]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[20]  Frank Mueller,et al.  Distributed Shared-Memory Threads: DSM-Threads , 2000 .

[21]  Michael L. Scott,et al.  The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[22]  Nikolaos Hardavellas,et al.  Cashmere-VLM: Remote memory paging for software distributed shared memory , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[23]  Mats Brorsson,et al.  OdinMP/CCp - a portable implementation of OpenMP for C , 2000, Concurr. Pract. Exp..

[24]  Kourosh Gharachorloo,et al.  Towards transparent and efficient software distributed shared memory , 1997, SOSP.

[25]  Xiang Yu,et al.  Application scaling under shared virtual memory on a cluster of SMPs , 1999, ICS '99.

[26]  Liviu Iftode,et al.  Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems , 1996, OSDI '96.