Transparent fault tolerance for java remote method invocation

The Java platform provides a number of attractive features—networking, remote method invocation, dynamic classloading, multi-threading and architecture-independence—that make it ideal for developing distributed applications. In addition, Java's easy-to-use programming model and its inherent portability make it easy to build distributed systems in a timely, efficient and extensible manner, thereby reducing the time-to-market and the costs associated with application development. Java Remote Method Invocation (JavaRMI) is the primary model for distributed computing in Java. However, while JavaRMI promotes access transparency and location transparency of remote servers to clients, it does not provide fault tolerance mechanisms to render faults transparent to the application. Instead, the occurrence of a fault in the system is exposed to the application, requiring application programmers to provide additional mechanisms to ensure correct, reliable and highly-available operation, even in the presence of faults. Because application programmers are not necessarily fault tolerance experts, this approach is error-prone and introduces unncessary complexity into the application. This dissertation research focusses on the development of the Aroma System, replication middleware that provides the fault tolerance that JavaRMI lacks. Aroma is deployed and exploited in a manner that is transparent to the application, requiring only minimal modifications to either the application or the JavaRMI infrastructure. This transparency is achieved through the use of interceptors, software mechanisms that we have developed to capture the network-bound traffic generated by the application, and divert it silently to the replication middleware. The Aroma System provides fault tolerance through the consistent replication of JavaRMI objects, and provides mechanisms to support both active and passive replication styles. Strong replica consistency is maintained under both fault-free and recovery conditions. The replication middleware also provides mechanisms to handle, transparently, the inherent non-determinism of the JavaRMI model, and provides fault tolerance for Java RMI applications that communicate over the Java Remote Method Protcol (JRMP).