Remote Control: Distributed Application Configuration, Management, and Visualization with Plush

Support for distributed application management in large-scale networked environments remains in its early stages. Although a number of solutions exist for subtasks of application deployment, monitoring, and maintenance in distributed environments, few tools provide a unified framework for application management. Many of the existing tools address the management needs of a single type of application or service that runs in a specific environment, and these tools are not adaptable enough to be used for other applications or platforms. To this end, we present the design and implementation of Plush, a fully configurable application management infrastructure designed to meet the general requirements of several different classes of distributed applications. Plush allows developers to specifically define the flow of control needed by their computations using application building blocks. Through an extensible resource management interface, Plush supports execution in a variety of environments, including both live deployment platforms and emulated clusters. Plush also uses relaxed synchronization primitives for improving fault tolerance and liveness in failure-prone environments. To gain an understanding of how Plush manages different classes of distributed applications, we take a closer look at specific applications and evaluate how Plush provides support for each.

[1]  Amin Vahdat,et al.  Loose Synchronization for Large-Scale Networked Systems , 2006, USENIX Annual Technical Conference, General Track.

[2]  Jennifer M. Schopf,et al.  Performance analysis of the Globus Toolkit Monitoring and Discovery Service, MDS2 , 2004, IEEE International Conference on Performance, Computing, and Communications, 2004.

[3]  Amin Vahdat,et al.  Design and evaluation of a continuous consistency model for replicated services , 2000, OSDI.

[4]  Francine Berman,et al.  New Grid Scheduling and Rescheduling Methods in the GrADS Project , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  A. Shoykhet,et al.  Virtuoso: A System For VirtualMachineMarketplaces , 2004 .

[6]  Francine Berman,et al.  New Grid Scheduling and Rescheduling Methods in the GrADS Project , 2004, IPDPS Next Generation Software Program - NSFNGS - PI Workshop.

[7]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[8]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[11]  KyoungSoo Park,et al.  Deploying Large File Transfer on an HTTP Content Distribution Network , 2004, WORLDS.

[12]  Jeannie R. Albrecht,et al.  Harnessing Virtual Machine Resource Control for Job Management , 2007 .

[13]  Al Geist,et al.  Network-based concurrent computing on the PVM system , 1992, Concurr. Pract. Exp..

[14]  Larry L. Peterson,et al.  The dark side of the Web , 2004, Comput. Commun. Rev..

[15]  Ian Foster,et al.  A Globus Toolkit Primer , 2005 .

[16]  B. F. Spencer,et al.  Distributed hybrid earthquake engineering experiments: experiences with a ground-shaking grid application , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[17]  Amin Vahdat,et al.  Mace: language support for building distributed systems , 2007, PLDI '07.

[18]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[19]  Dejan Kostic,et al.  Scalability and accuracy in a large-scale network emulator , 2002, CCRV.

[20]  Geoffrey C. Fox,et al.  Developing a Secure Grid Computing Environment Shell Engine: Containers and Services , 2004, Neural Parallel Sci. Comput..

[21]  K. Thompson,et al.  The UNIX time-sharing system , 1978 .

[22]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[23]  Charles E. Catlett The Philosophy of TeraGrid: Building an Open, Extensible, Distributed TeraScale Facility , 2002, CCGRID.

[24]  W Chiu,et al.  EMAN: semiautomated software for high-resolution single-particle reconstructions. , 1999, Journal of structural biology.

[25]  Paul Murray,et al.  SmartFrog: Configuration and Automatic Ignition of Distributed Applications , 2003 .

[26]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[27]  Subhash Saini,et al.  GridFlow: workflow management for grid computing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[28]  Edsger W. Dijkstra,et al.  The structure of the “THE”-multiprogramming system , 1968, CACM.

[29]  Amin Vahdat,et al.  Usher: An Extensible Framework for Managing Clusters of Virtual Machines , 2007, LISA.

[30]  Jeannie R. Albrecht,et al.  Managing Distributed Applications Using Gush , 2010, TRIDENTCOM.

[31]  Michel Raynal,et al.  Timed consistency for shared distributed objects , 1999, PODC '99.

[32]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[33]  KyoungSoo Park,et al.  CoMon: a mostly-scalable monitoring system for PlanetLab , 2006, OPSR.

[34]  Ian T. Foster,et al.  From sandbox to playground: dynamic virtual environments in the grid , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[35]  Andrew A. Chien,et al.  Efficient resource description and high quality selection for virtual grids , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[36]  David Mazières,et al.  Democratizing Content Publication with Coral , 2004, NSDI.

[37]  Wolfgang Gentzsch,et al.  Sun Grid Engine: towards creating a compute power grid , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[38]  Xiaomin Zhu,et al.  From virtualized resources to virtual computing grids: the In-VIGO system , 2005, Future Gener. Comput. Syst..

[39]  Monica S. Lam,et al.  The collective: a cache-based system management architecture , 2005, NSDI.

[40]  Jeannie R. Albrecht Bringing big systems to small schools: distributed systems for undergraduates , 2009, SIGCSE '09.

[41]  Amin Vahdat,et al.  Improving Scalability and Fault Tolerance in an Application Management Infrastructure , 2008, LASCO.

[42]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[43]  Eric A. Brewer,et al.  Harvest, yield, and scalable tolerant systems , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[44]  VahdatAmin,et al.  Distributed application configuration, management, and visualization with plush , 2011 .

[45]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[46]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[47]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[48]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .

[49]  Amin Vahdat,et al.  Design and implementation tradeoffs for wide-area resource discovery , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[50]  Chuang Liu,et al.  Design and evaluation of a resource selection framework for Grid applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[51]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[52]  David E. Irwin,et al.  Sharing Networked Resources with Brokered Leases , 2006, USENIX Annual Technical Conference, General Track.

[53]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[54]  H. F. Jordan A Special Purpose Architecture for Finite Element Analysis , 1978 .

[55]  David E. Culler,et al.  A blueprint for introducing disruptive technology into the Internet , 2003, CCRV.

[56]  David E. Culler,et al.  Operating Systems Support for Planetary-Scale Network Services , 2004, NSDI.

[57]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[58]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[59]  Amin Vahdat,et al.  Bullet: high bandwidth data dissemination using an overlay mesh , 2003, SOSP '03.

[60]  Ian T. Foster,et al.  Globus and PlanetLab resource management solutions compared , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[61]  Rajesh Raman,et al.  Policy driven heterogeneous resource co-allocation with Gangmatching , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[62]  Amin Vahdat,et al.  PlanetLab application management using plush , 2006, OPSR.

[63]  Hari Balakrishnan,et al.  Improving web availability for clients with MONET , 2005, NSDI.