eSciGrid: A P2P-based e-science Grid for scalable and efficient data sharing

E-science projects of various disciplines generate large amounts of data and face a fundamental challenge: thousands of researchers want to obtain new scientific results by logically relating subsets of the total volume of data. Considering the huge and widely distributed amounts of data, e-science communities investigate different technologies to provide fast access to the growing data sets. Among these technologies, Peer-to-Peer (P2P) and Data Grid are two models that fit these requirements well, because of their potential to provide a high quality of service with low cost. In this paper, we explore the possibility of using the P2P paradigm for data-intensive e-science applications on the Grid. We argue that additional support is required to achieve fast access to the huge and widely distributed amounts of data and propose eSciGrid to overcome the scalability barriers in today's e-science communities. eSciGrid allows e-science communities to achieve a high query throughput through a decentralized protocol which integrates caching with query processing. The protocol takes into account the physical distance between peers and the amount of traffic carried by each node. The result of this integration is constant complexity for moderate queries and fast data transfers between Grid peers. Our results show that eSciGrid increases the performance of data access on e-science Grids.

[1]  Srinivasan Seshan,et al.  Mercury: supporting scalable multi-attribute range queries , 2004, SIGCOMM 2004.

[2]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[3]  Hector Garcia-Molina,et al.  Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems , 2004, VLDB.

[4]  Sriram Ramabhadran,et al.  A case study in building layered DHT applications , 2005, SIGCOMM '05.

[5]  Krishna P. Gummadi,et al.  The impact of DHT routing geometry on resilience and proximity , 2003, SIGCOMM '03.

[6]  Divyakant Agrawal,et al.  Range addressable network: a P2P cache architecture for data ranges , 2003, Proceedings Third International Conference on Peer-to-Peer Computing (P2P2003).

[7]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[8]  Hiroyuki Ohsaki,et al.  GridFTP-APT: automatic parallelism tuning mechanism for data transfer protocol GridFTP , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[9]  Yuval Shavitt,et al.  DIMES: let the internet measure itself , 2005, CCRV.

[10]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[11]  J. Weiner,et al.  Describing inequality in plant size or fecundity , 2000 .

[12]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[13]  Bernhard Bauer,et al.  Scalable community-driven data sharing in e-science grids , 2009, Future Gener. Comput. Syst..

[14]  Scott Shenker,et al.  Spurring Adoption of DHTs with OpenHash, a Public DHT Service , 2004, IPTPS.

[15]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[16]  Domenico Talia,et al.  Peer-to-Peer resource discovery in Grids: Models and systems , 2007, Future Gener. Comput. Syst..

[17]  Beng Chin Ooi,et al.  An adaptive peer-to-peer network for distributed caching of OLAP results , 2002, SIGMOD '02.

[18]  Min Cai,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Journal of Grid Computing.

[19]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[20]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[21]  Gurmeet Singh Manku,et al.  Symphony: Distributed Hashing in a Small World , 2003, USENIX Symposium on Internet Technologies and Systems.

[22]  Yunhao Liu,et al.  RCT: A distributed tree for supporting efficient range and multi-attribute queries in grid computing , 2008, Future Gener. Comput. Syst..

[23]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[24]  Jared Saia,et al.  Choosing a random peer , 2004, PODC '04.

[25]  Hanan Samet,et al.  Using a distributed quadtree index in peer-to-peer networks , 2007, The VLDB Journal.

[26]  Divyakant Agrawal,et al.  A peer-to-peer framework for caching range queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[27]  Amin Vahdat,et al.  Distributed Resource Discovery on PlanetLab with SWORD , 2004, WORLDS.

[28]  Manish Parashar,et al.  Enabling flexible queries with guarantees in P2P systems , 2004, IEEE Internet Computing.

[29]  Ian T. Foster,et al.  Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations , 2002, IPTPS.

[30]  Hector Garcia-Molina,et al.  One torus to rule them all: multi-dimensional queries in P2P systems , 2004, WebDB '04.

[31]  Beng Chin Ooi,et al.  Supporting multi-dimensional range queries in peer-to-peer systems , 2005, Fifth IEEE International Conference on Peer-to-Peer Computing (P2P'05).

[32]  Theoni Pitoura,et al.  Replication, Load Balancing and Efficient Range Query Processing in DHTs , 2006, EDBT.

[33]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[34]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[35]  Ioannis Konstantinou,et al.  A grid middleware for data management exploiting peer-to-peer techniques , 2009, Future Gener. Comput. Syst..

[36]  M. Frans Kaashoek,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM 2004.