WOW: Self-Organizing Wide Area Overlay Networks of Virtual Workstations

This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peer-to-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the latency in joining a WOW network is of the order of seconds: in a set of 300 trials, 90% of the nodes self-configured P2P routes within 10 seconds, and more than 99% established direct connections to other nodes within 200 seconds. Experiments also show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. The testbed WOW achieves an overall throughput of 53 jobs/minute for PBS-scheduled executions of the MEME application (with average single-job sequential running time of 24.1s) and a parallel speedup of 13.5 for the PVM-based fastDNAml application. Experiments also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN

[1]  Mahadev Satyanarayanan,et al.  Internet suspend/resume , 2002, Proceedings Fourth IEEE Workshop on Mobile Computing Systems and Applications.

[2]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[3]  David E. Culler,et al.  PlanetLab: an overlay testbed for broad-coverage services , 2003, CCRV.

[4]  Xiaomin Zhu,et al.  From virtualized resources to virtual computing grids: the In-VIGO system , 2005, Future Gener. Comput. Syst..

[5]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[6]  Scott Shenker,et al.  Internet indirection infrastructure , 2004, IEEE/ACM Transactions on Networking.

[7]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  I. Foster,et al.  Virtual Cluster Workspaces for Grid Applications , 2005 .

[9]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[10]  Renato J. O. Figueiredo,et al.  Virtual Computing Infrastructures for Nanoelectronics Simulation , 2005, Proceedings of the IEEE.

[11]  Renato J. O. Figueiredo,et al.  A case for grid computing on virtual machines , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[12]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[13]  Daniel Zappala,et al.  Cluster Computing on the Fly : P 2 P Scheduling of Idle Cycles in the Internet , 2004 .

[14]  Saikat Guha,et al.  NUTSS: a SIP-based approach to UDP and TCP network connectivity , 2004, FDNA '04.

[15]  Peter A. Dinda,et al.  Dynamic topology adaptation of virtual networks of virtual machines , 2004 .

[16]  Monica S. Lam,et al.  Optimizing the migration of virtual computers , 2002, OPSR.

[17]  David Brumley,et al.  Virtual Appliances for Deploying and Maintaining Software , 2003, LISA.

[18]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[19]  Indranil Gupta,et al.  Peer-to-peer discovery of computational resources for Grid applications , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[20]  David E. Irwin,et al.  Dynamic virtual clusters in a grid site manager , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[21]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[22]  Miron Livny,et al.  CODO: firewall traversal by cooperative on-demand opening , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[23]  Andrew S. Grimshaw,et al.  Legion: flexible support for wide-area computing , 1996, EW 7.

[24]  Ian T. Foster,et al.  Virtual Workspaces in the Grid , 2005, Euro-Par.

[25]  Christian Huitema,et al.  STUN - Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs) , 2003, RFC.

[26]  Nazareno Andrade,et al.  Peer-to-peer grid computing with the OurGrid Community , 2005 .

[27]  Xuxian Jiang,et al.  VIOLIN: Virtual Internetworking on Overlay Infrastructure , 2004, ISPA.

[28]  Kees Verstoep,et al.  Wide-area communication for grids: an integrated solution to connectivity, performance and security problems , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[29]  Wentong Cai,et al.  A peer-to-peer approach to task scheduling in computation grid , 2003, Int. J. Grid Util. Comput..

[30]  Miron Livny,et al.  Recovering internet symmetry in distributed computing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[31]  Andrea C. Arpaci-Dusseau,et al.  Deploying Virtual Machines as Sandboxes for the Grid , 2005, WORLDS.

[32]  Mario Lauria,et al.  The organic grid: self-organizing computation on a peer-to-peer network , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[33]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[34]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[35]  Dongyan Xu,et al.  Short Paper: Autonomic Adaptation of Virtual Distributed Environments in a Multi-Domain Infrastructure , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[36]  Ian T. Foster,et al.  On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing , 2003, IPTPS.

[37]  Bryan Ford,et al.  Peer-to-Peer Communication Across Network Address Translators , 2005, USENIX Annual Technical Conference, General Track.

[38]  David R. Karger,et al.  Chord: a scalable peer-to-peer lookup protocol for internet applications , 2003, TNET.

[39]  Peter A. Dinda,et al.  Towards Virtual Networks for Virtual Machine Grid Computing , 2004, Virtual Machine Research and Technology Symposium.

[40]  José A. B. Fortes,et al.  A virtual network (ViNe) architecture for grid computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[41]  P. Oscar Boykin,et al.  IP over P2P: enabling self-configuring virtual IP networks for grid computing , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[42]  Lars Bengtsson,et al.  Jalapeno: secentralized grid computing using peer-to-peer technology , 2005, CF '05.

[43]  Jack J. Dongarra,et al.  The PVM Concurrent Computing System: Evolution, Experiences, and Trends , 1994, Parallel Comput..

[44]  William Gropp,et al.  Beowulf Cluster Computing with Linux , 2003 .

[45]  Ju Wang,et al.  The entropia virtual machine for desktop grids , 2005, VEE '05.

[46]  Henri E. Bal,et al.  NETIBIS: an efficient and dynamic communication system for heterogeneous grids , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[47]  Peter M. A. Sloot,et al.  The distributed ASCI Supercomputer project , 2000, OPSR.

[48]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[49]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[50]  William John Teahan,et al.  ParCop: a decentralized peer-to-peer computing system , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[51]  Donald K. Berry,et al.  Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[52]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[53]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[54]  Franck Cappello,et al.  Private Virtual Cluster: Infrastructure and Protocol for Instant Grids , 2006, Euro-Par.

[55]  I. Stoica,et al.  Supporting Legacy Applications over i 3 , 2004 .

[56]  Jason Maassen,et al.  The Albatross Project: Parallel Application Support for Computational Grids. , 2000 .

[57]  Ian T. Foster,et al.  A peer-to-peer approach to resource location in grid environments , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[58]  Saikat Guha,et al.  NUTSS: A SIP based approach to UDP and TCP connectivity , 2004, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.