The AS/400 cluster engine: A case study

In this paper we share our experience in harnessing group communications for the AS/400 cluster infrastructure. The main fonts of the AS/400 cluster is providing high availability and disaster recovery of defined cluster resources. The cluster supports up to 128 nodes, connected via any IP network. Cluster nodes and resources can be dynamically added or removed. Administrative actions or failures trigger automatic cluster reconfigurations: CLUE (Cluster Engine) is the group communication middleware implemented in the OS/400 kernel. It exploits the strong virtual synchrony model. CLUE is unique in the following aspects: It is the first virtually synchronous group communication system incorporated into a cluster solution, and customized to meet cluster needs. It is the only group communication system (GCS) integrated into a commercial operating system kernel. CLUE's special features (e.g., Flexible Group Member) can be relevant to other group communication systems.

[1]  Roy Friedman,et al.  Strong and weak virtual synchrony in Horus , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[2]  Danny Dolev,et al.  Highly available cluster: a case study , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[3]  Philip S. Yu,et al.  Cluster Architectures and S/390 Parallel Sysplex Scalability , 1997, IBM Syst. J..

[4]  Kenneth P. Birman,et al.  The design and architecture of the Microsoft Cluster Service-a practical approach to high-availability and scalability , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[5]  Yair Amir,et al.  Transis: A Communication Sub-system for High Availability , 1992 .