Harnessing Shared Wide-area Clusters for Dynamic High End Services

Current trends in distributed computing have been moving towards the use of wide-area clusters that are managed by different entities. In this paper, we introduce middleware-level support to facilitate computational resource sharing with service guarantees using non-dedicated server systems in wide-area clusters. The aim is to ensure that sets of computational tasks submitted to such high end systems are completed reliably and in a timely fashion. Our approach develops methods that enhance basic job scheduling with information about the execution history and trust values for the computational nodes to which jobs are assigned. In essence, job scheduling is enriched with trust models constructed and maintained at runtime, and scheduling decisions are based on metrics that capture trust in remote server systems. An implementation of the approach is evaluated on Planetlab, with initial results demonstrating good success rates in completing jobs within their specific service level agreements, including under conditions of high system loads. Additional results are attained with a variant of the scheduling algorithm that uses redundancy to further improve the likelihood of meeting end user SLAs. A representative application considered in this paper is remote data visualization, where substantial computation must be applied to data before displaying it to end users. SLAs capture desired end-to-end delay, and distributed server or cluster systems are used to perform the required computations in a timely manner

[1]  Xian-He Sun,et al.  Limitations of Cycle Stealing for Parallel Processing on a Network of Homogeneous Workstations , 1997, J. Parallel Distributed Comput..

[2]  Calton Pu,et al.  Operational information systems: an example from the airline industry , 2000, WIESS'00.

[3]  Chase Qishi Wu,et al.  Ultrascience net: network testbed for large-scale science applications , 2005, IEEE Communications Magazine.

[4]  Hector Garcia-Molina,et al.  SLIC: a selfish link-based incentive mechanism for unstructured peer-to-peer networks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[5]  Ling Liu,et al.  A reputation-based trust model for peer-to-peer ecommerce communities , 2003, EC.

[6]  Munindar P. Singh,et al.  Distributed Reputation Management for Electronic Commerce , 2002, Comput. Intell..

[7]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[8]  Daniel Zappala,et al.  Cluster Computing on the Fly : P 2 P Scheduling of Idle Cycles in the Internet , 2004 .

[9]  Kaizar Amin,et al.  A Framework for Building Scientific Knowledge Grids Applied to Thermochemical Tables , 2003, Int. J. High Perform. Comput. Appl..

[10]  Philippe Golle,et al.  Uncheatable Distributed Computations , 2001, CT-RSA.

[11]  Amin Vahdat,et al.  SHARP: an architecture for secure resource peering , 2003, SOSP '03.

[12]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[13]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[14]  Y. Charlie Hu,et al.  A Self-Organizing Flock of Condors , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[15]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.

[16]  John A. Stankovic,et al.  An Application of Bayesian Decision Theory to Decentralized Control of Job Scheduling , 1985, IEEE Transactions on Computers.

[17]  Henri Casanova,et al.  UMR: a multi-round algorithm for scheduling divisible workloads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[18]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[19]  Wenliang Du,et al.  Uncheatable grid computing , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[20]  Miron Livny,et al.  Managing network resources in Condor , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[21]  Gustavo Alonso,et al.  CheeTah: a Lightweight Transaction Server for Plug-and-Play Internet Data Management , 2000, VLDB.

[22]  Krithi Ramamritham,et al.  Dynamic Task Scheduling in Hard Real-Time Distributed systems , 1984, IEEE Software.

[23]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[24]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[25]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[26]  Karsten Schwan,et al.  SmartPointers: Personalized Scientific Data Portals In Your Hand , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[27]  Michael P. Wellman,et al.  Exploring bidding strategies for market-based scheduling , 2003, EC '03.

[28]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[29]  Karsten Schwan,et al.  A middleware toolkit for client-initiated service specialization , 2001, OPSR.

[30]  Marvin Theimer,et al.  Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs , 2000, SIGMETRICS '00.

[31]  Peter A. Dinda,et al.  Measuring and understanding user comfort with resource borrowing , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[32]  Karl Aberer,et al.  Managing trust in a peer-2-peer information system , 2001, CIKM '01.