Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach

Whereas traditional Desktop Grids rely on centralized servers for data management, some recent progress has been made to enable distributed, large input data, using to peer-to-peer (P2P) protocols and Content Distribution Networks (CDN). We make a step further and propose a generic, yet efficient data storage which enables the use of Desktop Grids for applications with high output data requirements, where the access grain and the access patterns may be random. Our solution builds on a blob management service enabling a large number of concurrent clients to efficiently read/write and append huge data that are fragmented and distributed at a large scale. Scalability under heavy concurrency is achieved thanks to an original metadata scheme using a distributed segment tree built on top of a Distributed Hash Table (DHT). The proposed approach has been implemented and its benefits have successfully been demonstrated within our BlobSeer prototype on the Grid'5000 testbed.

[1]  Matei Ripeanu,et al.  stdchk: A Checkpoint Storage System for Desktop Grid Computing , 2007, 2008 The 28th International Conference on Distributed Computing Systems.

[2]  Gabriel Antoniu,et al.  BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency , 2009, EDBT/ICDT '09.

[3]  Gabriel Antoniu,et al.  Distributed Management of Massive Data: An Efficient Fine-Grain Data Access Scheme , 2008, VECPAR.

[4]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[5]  Gabriel Antoniu,et al.  Enabling lock-free concurrent fine-grain access to massive distributed data: Application to supernovae detection , 2008, 2008 IEEE International Conference on Cluster Computing.

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Darrell D. E. Long,et al.  Deep Store: an archival storage system architecture , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Paul F. Newbury,et al.  How to Build an Open Source Render Farm Based on Desktop Grid Computing , 2008, IMTIC.

[9]  Gilles Fedak,et al.  Optimizing Data Distribution in Desktop Grid Platforms , 2008, Parallel Process. Lett..

[10]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[11]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[12]  Shipeng Li,et al.  Distributed Segment Tree: Support of Range Query and Cover Query over DHT , 2006, IPTPS.

[13]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[14]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.