Running MPI Applications over an Opportunistic Infrastructure

We propose a method based on Open MPI and BLCR checkpoints to allow executing MPI applications over non-dedicated and failure-prone computing infrastructures. To this end, the method allows automatic detection and recovery of MPI applications in case of failures while generating minimum overhead to the overall execution process. The method was tested by using Una Cloud, an opportunistic Cloud Computing IaaS implementation which provides private clouds supported by idle computing resources available in computer laboratories from a university campus. The tests were performed by executing a Simple Ray Tracing MPI application which rendering operations required several hours of processing and intercommunication among nodes. The results show that the proposed method can be effectively used to run MPI applications through the use of checkpoint/restart recovery techniques even if the supporting infrastructure exhibits high volatility.