A distributed virtual hypercube algorithm for maintaining scalable and dynamic network overlays

Network overlays support the execution of distributed applications, hiding lower level protocols and the physical topology. This work presents DiVHA: a distributed virtual hypercube algorithm that allows the construction and maintenance of a self‐healing overlay network based on a virtual hypercube. DiVHA keeps logarithmic properties even when the number of nodes is not a power of two, presenting a scalable alternative to connect distributed resources. DiVHA assumes a dynamic fault situation, in which nodes fail and recover continuously, leaving and joining the system. The algorithm is formally specified, and the latency for detecting changes and the subsequent reconstruction of the topology is proved to be bounded. An actual overlay network based on DiVHA called HyperBone was implemented and deployed in the PlanetLab. HyperBone offers services such as monitoring and routing, allowing the execution Grid applications across the Internet. HyperBone also includes a procedure for detecting groups of stable nodes, which allowed the execution of parallel applications on a virtual hypercube built on top of PlanetLab. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Takashi Nanya,et al.  A Hierarachical Adaptive Distributed System-Level Diagnosis Algorithm , 1998, IEEE Trans. Computers.

[2]  Márk Jelasity,et al.  T-Man: Gossip-based fast overlay topology construction , 2009, Comput. Networks.

[3]  Nian-Feng Tzeng,et al.  Structural and Tree Embedding Aspects of Incomplete Hypercubes , 1994, IEEE Trans. Computers.

[4]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[5]  Xiaozhou Li,et al.  Concurrent Maintenance of Rings , 2006, Distributed Computing.

[6]  Stéphane Genaud,et al.  P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2007, Journal of Grid Computing.

[7]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[8]  M. H. MacDougall Simulating computer systems: techniques and tools , 1989 .

[9]  Jie Wu,et al.  Evaluation of a fault tolerant distributed broadcast algorithm in hypercube multicomputers , 1992, CSC '92.

[10]  Hari Balakrishnan,et al.  Resilient overlay networks , 2001, SOSP.

[11]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[12]  Thomas R. Gross,et al.  Design, Implementation, and Evaluation of the Remos Network Monitoring System , 2004, Journal of Grid Computing.

[13]  Lican Huang VIRGO: Virtual Hierarchical Overlay Network for Scalable Grid Computing , 2005, EGC.

[14]  Amos Fiat,et al.  Censorship resistant peer-to-peer content addressable networks , 2002, SODA '02.

[15]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[16]  George Bosilca,et al.  Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology , 2007, ISPA.

[17]  Cauligi S. Raghavendra,et al.  Algorithms and Bounds for Shortest Paths and Diameter in Faulty Hypercubes , 1993, IEEE Trans. Parallel Distributed Syst..

[18]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[19]  Soonwook Hwang,et al.  A Flexible Framework for Fault Tolerance in the Grid , 2003, Journal of Grid Computing.

[20]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[21]  Stefan Saroiu,et al.  Dynamically Fault-Tolerant Content Addressable Networks , 2002, IPTPS.

[22]  Elias Procópio Duarte,et al.  A scalable monitoring strategy for highly dynamic systems , 2008, NOMS 2008 - 2008 IEEE Network Operations and Management Symposium.

[23]  Yair Amir,et al.  Reliable communication in overlay networks , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[24]  Wolfgang Nejdl,et al.  HyperCuP - Hypercubes, Ontologies, and Efficient Search on Peer-to-Peer Networks , 2002, AP2PC.

[25]  Wolfgang Nejdl,et al.  Hyper-CuP-Hypercubes, Ontologies and EGcient Search on P2P Networks , 2002 .

[26]  Pedro A. Szekely,et al.  MAAN: A Multi-Attribute Addressable Network for Grid Information Services , 2003, Proceedings. First Latin American Web Congress.

[27]  Manfred Hauswirth,et al.  An Overlay Network for Resource Discovery in Grids , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[28]  M. H. Schultz,et al.  Topological properties of hypercubes , 1988, IEEE Trans. Computers.

[29]  Elias Procópio Duarte,et al.  Finding stable cliques of PlanetLab nodes , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[30]  David Clark,et al.  Face-to-Face with Peer-to-Peer Networking , 2001 .

[31]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[32]  Miroslaw Malek,et al.  Nest: A Nested-Predicate Scheme for Fault Tolerance , 1993, IEEE Trans. Computers.

[33]  Tim Moors,et al.  Topology Dissemination for Reliable One-Hop Distributed Hash Tables , 2009, IEEE Transactions on Parallel and Distributed Systems.

[34]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[35]  Mikel Larrea,et al.  On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems , 2004, IEEE Trans. Computers.

[36]  John P. Hayes,et al.  Architecture of a Hypercube Supercomputer , 1986, ICPP.

[37]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[38]  Paulo Veríssimo,et al.  Travelling through wormholes: a new look at distributed systems models , 2006, SIGA.

[39]  Donal O'Mahony,et al.  Overlay Networks: A Scalable Alternative for P2P , 2003, IEEE Internet Comput..

[40]  Randy H. Katz,et al.  On failure detection algorithms in overlay networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..