A Pure Peer-To-Peer Desktop Grid framework with efficient fault tolerance

P2P computing is the sharing of computer resources by direct exchange. P2P desktop grid is a P2P computing environment with desktop resources and usually built on the Internet infrastructure. The most important challenges for a P2P desktop grid involve: 1) minimizing reliance on central servers to achieve decentralization, 2) providing interoperability with other platforms, 3) providing interaction methodologies between grid nodes that overcome connectivity problems in the Internet environment, and 4) providing efficient fault tolerance to maintain performance with frequent faults. The main objective of this paper is to introduce a pure P2P desktop grid framework that built on Microsoft's .Net technology. The proposed framework composed of the following components, 1) a communication protocol based on both FTP and HTTP, for interaction between grid nodes to provide interoperability, 2) An efficient checkpointing approach to provide fault tolerance, and 3) Four interaction models for implementing connectivity for both serial and parallel execution. No reliance on central servers involved in the framework. Such framework will help in overcoming the problems associated to decentralization, interoperability, connectivity and fault tolerance. Performance evaluation has been implemented by running an application code built on variable dimensions' matrix multiplication on a desktop grid based on the proposed framework. Performed experiments have been focused on measuring the impact of failures on the execution time for different connectivity models. Experimental results show that using the proposed framework as an infrastructure for running distributed applications has a great impact on improving fault tolerance, beside achieving full decentralization, interoperability and solving connectivity problems.

[1]  Vijay K. Naik,et al.  Harmony: a desktop grid for delivering enterprise computations , 2003, Proceedings. First Latin American Web Congress.

[2]  Andrew S. Grimshaw,et al.  Using Reflection for Incorporating Fault-Tolerance Techniques into Distributed Applications , 1998, Parallel Process. Lett..

[3]  Rajkumar Buyya,et al.  Peer-to-Peer Grid Computing and a .NET-Based Alchemi Framework , 2006 .

[4]  Fabio Kon,et al.  Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems , 2005, MGC '05.

[5]  Song Jiang,et al.  Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[7]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[9]  Robert Hood,et al.  Use-Cases for Grid Checkpoint and Recovery , 2007 .

[10]  Miron Livny,et al.  Condor Birdbath: Web Service interfaces to condor , 2005 .

[11]  Péter Kacsuk,et al.  Scalable Desktop Grid System , 2006, VECPAR.

[12]  A.E. El-Desoky,et al.  Improving Fault Tolerance in Desktop Grids Based On Incremental Checkpointing , 2006, 2006 International Conference on Computer Engineering and Systems.

[13]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[14]  David P. Anderson,et al.  High-performance task distribution for volunteer computing , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).