Node-Capability-Aware Replica Management for Peer-to-Peer Grids

Data objects have to be replicated in large-scale distributed systems for reasons of fault tolerance, availability, and performance. Furthermore, computations may have to be scheduled on these objects, when these objects are part of a grid computation. Although replication mechanism for unstructured peer-to-peer (P2P) systems can place replicas on capable nodes, they may not be able to provide deterministic guarantees on searching. Replication mechanisms in structured P2P systems provide deterministic guarantees on searching but do not address node capability in replica placement. We propose Virat, a node-capability-aware P2P middleware for managing replicas in large-scale distributed systems. Virat uses a unique two-layered architecture that builds a structured overlay over an unstructured P2P layer, combining the advantages of both structured and unstructured P2P systems. Detailed performance comparison is made with a replication mechanism realized over OpenDHT, a state-of-the-art structured P2P system. We show that the 99th percentile response time for Virat does not exceed 600 ms, whereas for OpenDHT, it goes beyond 2000 ms in our test bed, created specifically for the aforementioned comparison.

[1]  Yiming Hu,et al.  Towards efficient load balancing in structured P2P systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[2]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[3]  Scott Shenker,et al.  Fixing the Embarrassing Slowness of OpenDHT on PlanetLab , 2005, WORLDS.

[4]  Hassan Charaf,et al.  Modeling Peer-to-Peer Networks with Interest-Based Clusters , 2005 .

[5]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[6]  Andreas Haeberlen,et al.  Fallacies in Evaluating Decentralized Systems , 2006, IPTPS.

[7]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[8]  Karl Aberer,et al.  Updates in highly unreliable, replicated peer-to-peer systems , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[9]  Anirban Mondal,et al.  Effective Dynamic Replication in Wide-Area Network Environments: A Perspective , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[10]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[11]  David Mazières,et al.  Kademlia: A Peer-to-Peer Information System Based on the XOR Metric , 2002, IPTPS.

[12]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[13]  Nectarios Koziris,et al.  A peer-to-peer replica management service for high-throughput grids , 2005, 2005 International Conference on Parallel Processing (ICPP'05).

[14]  D. Janaki Ram,et al.  Vishwa: A reconfigurable P2P middleware for Grid Computations , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[15]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[16]  Brian Tierney,et al.  File and Object Replication in Data Grids , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[17]  Erwin Laure,et al.  Replica Management in Data Grids , 2002 .

[18]  A. Vijay Srinivas,et al.  Scaling a Shared Object Space to the Internet: Case Study of Virat , 2006 .

[19]  Calton Pu,et al.  PeerCast: Churn-resilient end system multicast on heterogeneous overlay networks , 2008, J. Netw. Comput. Appl..

[20]  Peter Druschel,et al.  Pastry: Scalable, distributed object location and routing for large-scale peer-to- , 2001 .

[21]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[22]  Nazareno Andrade,et al.  Discouraging free riding in a peer-to-peer CPU-sharing grid , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[23]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[24]  Craig J. Patten,et al.  Flexible high-performance access to distributed storage resources , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[25]  Alexander S. Szalay,et al.  The world-wide telescope , 2001, CACM.

[26]  Jarek Nieplocha,et al.  ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis , 2006, IEEE Transactions on Parallel and Distributed Systems.

[27]  J. Frankel,et al.  The gnutella protocol specification v0.4 document revision 1.2 , 2000 .

[28]  David R. Kincaid,et al.  Numerical mathematics and computing , 1980 .

[29]  Robert Tappan Morris,et al.  Ivy: a read/write peer-to-peer file system , 2002, OSDI '02.

[30]  John Kubiatowicz,et al.  Handling churn in a DHT , 2004 .

[31]  D. Janaki Ram,et al.  Anonymous Remote Computing: A Paradigm for Parallel Programming on Interconnected Workstations , 1999, IEEE Trans. Software Eng..

[32]  Thomas Kunz,et al.  The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme , 1991, IEEE Trans. Software Eng..

[33]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[34]  Eytan Adar,et al.  Free Riding on Gnutella , 2000, First Monday.

[35]  Vijay Gopalakrishnan,et al.  Adaptive replication in peer-to-peer systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[36]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[37]  Sébastien Monnet,et al.  How to bring together fault tolerance and data consistency to enable Grid data sharing , 2006, Concurr. Comput. Pract. Exp..

[38]  Anirban Mondal,et al.  On Improving the Performance Dependability of Unstructured P2P Systems via Replication , 2004, DEXA.

[39]  Peter Z. Kunszt,et al.  Giggle: A Framework for Constructing Scalable Replica Location Services , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[40]  Mathieu Jan,et al.  JuxMem: An Adaptive Supportive Platform for Data Sharing on the Grid , 2001, Scalable Comput. Pract. Exp..

[41]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[42]  Kavitha Ranganathan,et al.  Identifying Dynamic Replication Strategies for a High-Performance Data Grid , 2001, GRID.

[43]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[44]  Andrea C. Arpaci-Dusseau,et al.  Pipeline and batch sharing in grid workloads , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[45]  Miguel Castro,et al.  Should we build Gnutella on a structured overlay? , 2004, Comput. Commun. Rev..

[46]  Bobby Bhattacharjee,et al.  Are Virtualized Overlay Networks Too Much of a Good Thing? , 2002, IPTPS.

[47]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[48]  Brighten Godfrey,et al.  OpenDHT: a public DHT service and its uses , 2005, SIGCOMM '05.

[49]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[50]  Edith Cohen,et al.  Replication strategies in unstructured peer-to-peer networks , 2002, SIGCOMM.

[51]  David Abramson,et al.  Economic models for resource management and scheduling in Grid computing , 2002, Concurr. Comput. Pract. Exp..

[52]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[53]  Vijay Srinivas Agneeswaran,et al.  Avalanche Dynamics in Grids: Indications of SOC or HOT? , 2005, SOAS.

[54]  Ross Mcnab,et al.  Simjava: A Discrete Event Simulation Library For Java , 1998 .

[55]  D. Janaki Ram,et al.  A data-centric concurrency control mechanism for three tier systems , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[56]  Richard M. Karp,et al.  Load balancing in dynamic structured P2P systems , 2004, IEEE INFOCOM 2004.

[57]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[58]  Mudhakar Srivatsa,et al.  Large Scaling Unstructured Peer-to-Peer Networks with Heterogeneity-Aware Topology and Routing , 2006, IEEE Transactions on Parallel and Distributed Systems.

[59]  Miron Livny,et al.  A worldwide flock of Condors: Load sharing among workstation clusters , 1996, Future Gener. Comput. Syst..