The Beehive Cluster System

In this writeup, we present the system architecture of Beehive, a cluster system we are developing at Georgia Tech for supporting interactive applications and compute-intensive servers. The system provides a shared memory programming environment on a cluster of workstations interconnected by a high speed interconnect. The principal design features of Beehive include: a global address space across the cluster (Section 5), a configurable access granularity (Section 6), flexibility in supporting spatial and temporal notions of synchronization and consistency (Section 7), and multithreading (Section 4). The fundamental design principle is to use only commodity hardware and software components as the basis, and build the shared memory system architecture entirely in software. The mechanisms for shared memory parallel programming are made available to the application programmer via library calls. We consciously target application domains which are expected to be ideally suited for cluster parallel computing in designing the system architecture of Beehive. In particular, we base our design on our understanding of the requirements of interactive applications such as virtual environments, our work in the storage architecture of database servers [9, 10], as well as our experience in parallel computing for scientific domains [24, 30, 32]. Figure 1 pictorially depicts the current organization of the Beehive cluster. Each box of the cluster can be a uniprocessor or an SMP. We do not address heterogeneity in the processor architectures in our current design. The interconnect can be realized out of any commodity network switch so long as they have the right latency properties for shared memory style communication. The requirements from the operating system to support the Beehive system architecture are: a network file system , and the ability tospecify a virtual address range during memory allocation – a feature which is easily implementable in most Unix operating systems (e.g. using the mmapsystem call). In addition to these two requirements, athread-awareoperating system would be a plus for supporting the multithreading features of Beehive.

[1]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[2]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Umakishore Ramachandran,et al.  Transient versioning for consistency and concurrency in client-server systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[4]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[5]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[6]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[7]  Steve R. Kleiman,et al.  SunOS Multi-thread Architecture , 1991, USENIX Winter.

[8]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[9]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[10]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[11]  Joonwon Lee,et al.  Architectural primitives for a scalable shared memory multiprocessor , 1991, SPAA '91.

[12]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[13]  Mary K. Vernon,et al.  A Hybrid Shared Memory/Message Passing Parallel Machine , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[14]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[15]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[16]  Jessica K. Hodgins,et al.  Temporal notions of synchronization and consistency in Beehive , 1997, SPAA '97.

[17]  Umakishore Ramachandran,et al.  The Quest for a Zero Overhead Shared Memory Parallel Machine , 1995, ICPP.

[18]  Anand Sivasubramaniam,et al.  Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors , 1995, SC.

[19]  David R. Cheriton,et al.  Logged virtual memory , 1995, SOSP.

[20]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[21]  Umakishore Ramachandran,et al.  Relaxed index consistency for a client-server database , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Anand Sivasubramaniam,et al.  A Simulation-Based Scalability Study of Parallel Systems , 1994, J. Parallel Distributed Comput..

[23]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[24]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[25]  Liviu Iftode,et al.  Understanding Application Performance on Shared Virtual Memory Systems , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[26]  Karsten Schwan,et al.  Indigo: user-level support for building distributed shared abstractions , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[27]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local , 1995 .

[28]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[29]  Michael L. Scott,et al.  Software cache coherence for large scale multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[30]  Per Stenström,et al.  Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[31]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[32]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[33]  Anant Agarwal,et al.  Integrating message-passing and shared-memory: early experience , 1993, SIGP.

[34]  Michael C. Browne,et al.  The S3.mp scalable shared memory multiprocessor , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.