The ISIS Project: Real Experience with a Fault Tolerant Programming System

Abstract : The ISIS project has developed a distributed programming toolkit and a collection of higher level applications based on these tools. ISIS is now in use at more than 300 locations world-wide. Here, we discuss the lessons (and surprises) gained from this experience with the real world. ISIS differs from other process-group-based systems because it integrates group membership changes with communication, and because of the multicase communication primitives we call CBCAST and ABCAST. Virtual synchrony underlies those aspects of ISIS that have been most successful. The approach makes it possible for a process to infer the state and actions of remote process using local state information and events that have been locally observed. Our experiences confirm that using this property, one can often arrive at elegant, efficient solutions to problems that would be difficult to formulate-and extremely complex to implement-on a bare message-passing system. (rrh)