P2P-MPI : A fault-tolerant Message Passing Interface Implementation for Grids

This thesis aims to demonstrate that message-passing parallel programs can be deployed onto large, heterogeneous distributed systems. This work consists in the design and development of a proof-of-concept middleware named P2P-MPI, released under a public license. P2P-MPI alleviates this task by proposing a peer-to-peer based platform in which available resources are dynamically discovered upon job requests, and by providing a fault-tolerant message-passing library for Java programs. The motivation for this project is to offer a programming environment which is: - integrated: its embeds both a middleware layer and a communication library, - in Java (for its "run everywhere" feature), - light-weight to encourage average users to have a good grasp on it: it is contained in only 1 jar, and runs in user space, We have also integrated in P2P-MPI contributions to major issues in the field: * Message-passing programming model: P2P-MPI integrates an important subset of MPJ (we are able to pass the Java Grande Forum benchmark). In that respect, we faced the same problems as Ibis, or MPJ express. We currently have two implementations: the first implementation uses tcp sockets only and a limited port range (for firewalls concerns). Recently, we decided to offer a more performant implementation using Java NIO. * Fault-tolerance: we provide some fault-tolerance through replication of computations. A number of copies of each process may be asked to run simultaneously at runtime (this mechanism is user-friendly because the user simply chooses the replication degree through a command line argument). So, contrarily to an MPI application that crashes as soon as any of its processes crash, a program using replication will be able to continue as long as at least one copy of each process is running. * Scalability: the goal is to scale to hundreds of nodes, especially over geographically distributed resources. Since the beginning of the project, we have based the middleware on a P2P layer to avoid single point of failures concerning resource discovery. Another key design feature is the fault-detection mechanism, based on the principle of failure detectors. A major issue when scaling is the detection time of a failure. We have extensively studied the behavior of these algorithms in P2P-MPI and assessed their effectiveness in real experiments.