Taxonomy of Contention Management in Interconnected Distributed Systems

Interconnected distributed computing systems, such as computing Grids and federated Clouds, have been of special importance in both industry and academia. Resources provided in these environments are usually shared between users from different groups and/or organizations. Therefore these environments are prone to contention between user requests for accessing resources. Particularly, resource contention takes place when a user requests cannot be admitted or cannot sufficiently access resources because they are occupied by other requests. In this paper, we deal with different types of resource contentions occurring in interconnected distributed systems as well as approaches for resolving them. Approaches developed to resolve resource contentions share similarities in many aspects while being different in other aspects. We investigate the features of these approaches, identify and categorize the similarities and differences of them. Additionally, we review various resource management systems of interconnected distributed systems and group them based on the identified specifications.

[1]  David E. Irwin,et al.  Sharing Networked Resources with Brokered Leases , 2006, USENIX Annual Technical Conference, General Track.

[2]  Angela C. Sodan,et al.  Service control with the preemptive parallel job scheduler Scojo-PECT , 2011, Cluster Computing.

[3]  Srikumar Venugopal,et al.  A Portal for Grid-enabled Physics , 2005, ACSW.

[4]  Xian-He Sun,et al.  Performance Modeling and Prediction of Nondedicated Network Computing , 2002, IEEE Trans. Computers.

[5]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[6]  Willy Zwaenepoel,et al.  Cluster reserves: a mechanism for resource management in cluster-based network servers , 2000, SIGMETRICS '00.

[7]  Lavanya Ramakrishnan,et al.  On the Duality of Resource Leases and Jobs , 2007 .

[8]  Lior Amar,et al.  An organizational grid of federated MOSIX clusters , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[9]  Mark J. Clement,et al.  Preemption Based Backfill , 2002, JSSPP.

[10]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[11]  Rajkumar Buyya,et al.  Preemption-Aware Energy Management in Virtualized Data Centers , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[12]  Rajkumar Buyya,et al.  Evaluating the cost-benefit of using cloud computing to extend the capacity of clusters , 2009, HPDC '09.

[13]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[14]  Borja Sotomayor,et al.  Combining batch execution and leasing using virtual machines , 2008, HPDC '08.

[15]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[16]  Rajkumar Buyya,et al.  Performance Analysis of Preemption-Aware Scheduling in Multi-cluster Grid Environments , 2011, ICA3PP.

[17]  Ian T. Foster,et al.  DI-GRUBER: A Distributed Approach to Grid Resource Brokering , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[18]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[19]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[20]  James F. Doyle,et al.  Peer-to-Peer: harnessing the power of disruptive technologies , 2001, UBIQ.

[21]  Rajkumar Buyya,et al.  Environment-conscious scheduling of HPC applications on distributed Cloud-oriented data centers , 2011, J. Parallel Distributed Comput..

[22]  Ian T. Foster,et al.  Virtual Workspaces in the Grid , 2005, Euro-Par.

[23]  Daniel C. Stanzione,et al.  Dynamic Virtual Clustering with Xen and Moab , 2006, ISPA Workshops.

[24]  Karsten Schwan,et al.  VirtualPower: coordinated power management in virtualized enterprise systems , 2007, SOSP.

[25]  F. Cappello,et al.  Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[26]  Rajkumar Buyya,et al.  Financial Option Market Model for Federated Cloud Environments , 2012, 2012 IEEE Fifth International Conference on Utility and Cloud Computing.

[27]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[28]  Hong Ong,et al.  VCCP: A transparent, coordinated checkpointing system for virtualization-based cluster computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[29]  Rajkumar Buyya,et al.  A Grid service broker for scheduling e‐Science applications on global data Grids , 2006, Concurr. Comput. Pract. Exp..

[30]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[31]  Rudolf Eigenmann,et al.  Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical Evaluation , 2007, Journal of Grid Computing.

[32]  Hua Zhong,et al.  CRAK: Linux Checkpoint/Restart As a Kernel Module , 1996 .

[33]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[34]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[35]  César A. F. De Rose,et al.  The virtual cluster: a dynamic network environment for exploitation of idle resources , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[36]  Rajkumar Buyya,et al.  Performance analysis of multiple site resource provisioning: effects of the precision of availability information , 2008, HiPC'08.

[37]  Alexandru Iosup,et al.  The performance of bags-of-tasks in large-scale distributed systems , 2008, HPDC '08.

[38]  Bu-Sung Lee,et al.  A dynamic admission control scheme to manage contention on shared computing resources , 2009 .

[39]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[40]  Akshat Verma,et al.  Power-aware dynamic placement of HPC applications , 2008, ICS '08.

[41]  Andrew S. Grimshaw,et al.  Heterogeneous process state capture and recovery through Process Introspection , 2000, Cluster Computing.

[42]  Evgenia Smirni,et al.  Multiple-Queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems , 2002, JSSPP.

[43]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[44]  Benny Rochwerger,et al.  Reservoir - When One Cloud Is Not Enough , 2011, Computer.

[45]  Nikola Bogunović,et al.  Analysis of Scheduling Algorithms for Computer Clusters , 2008 .

[46]  Yookun Cho,et al.  Space-Efficient Page-Level Incremental Checkpointing , 2006 .

[47]  Bu-Sung Lee,et al.  A dynamic admission control scheme to manage contention on shared computing resources , 2009, Concurr. Comput. Pract. Exp..

[48]  Xian-He Sun,et al.  Data collection and restoration for heterogeneous process migration , 2002, Softw. Pract. Exp..

[49]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[50]  Hossein Deldari,et al.  Balancing Load in a Computational Grid Applying Adaptive, Intelligent Colonies of Ants , 2008, Informatica.

[51]  José Luis Vázquez-Poletti,et al.  A comparison between two grid scheduling philosophies: EGEE WMS and Grid Way , 2007, Multiagent Grid Syst..

[52]  Rajkumar Buyya,et al.  Resource Provisioning based on Leases Preemption in InterGrid , 2011, ACSC.

[53]  Nazareno Andrade,et al.  Automatic grid assembly by promoting collaboration in peer-to-peer grids , 2007, J. Parallel Distributed Comput..

[54]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[55]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[56]  Artur Andrzejak,et al.  Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[57]  Laurence Field,et al.  Grid interoperability: the interoperations cookbook , 2008 .

[58]  Jean-Marc Menaud,et al.  SLA-Aware Virtual Resource Management for Cloud Infrastructures , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[59]  Rajkumar Buyya,et al.  Adapting Market-Oriented Scheduling Policies for Cloud Computing , 2010, ICA3PP.

[60]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[61]  Fabien Hermenier,et al.  Cluster-wide context switch of virtualized jobs , 2010, HPDC '10.

[62]  Eddy Caron,et al.  Parallel Extension of a Dynamic Performance Forecasting Tool , 2001, Scalable Comput. Pract. Exp..

[63]  Sara Sprenkle,et al.  Managing Mixed-Use Clusters with Cluster-on-Demand , 2002 .

[64]  Wei Huang,et al.  High performance virtual machine migration with RDMA over modern interconnects , 2007, 2007 IEEE International Conference on Cluster Computing.

[65]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[66]  John Paul Walters,et al.  Enabling Interactive Jobs in Virtualized Data Centers ( Extended Abstract ) , 2008 .

[67]  Ching-Hsien Hsu,et al.  Middleware of Taiwan UniGrid , 2008, SAC '08.

[68]  Renato J. O. Figueiredo,et al.  VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[69]  Eduardo Huedo,et al.  Federation of TeraGrid, EGEE and OSG infrastructures through a metascheduler , 2010, Future Gener. Comput. Syst..

[70]  Lior Amar,et al.  The Power of Preemption in Economic Online Markets , 2008, GECON.

[71]  Thomas Sandholm,et al.  Admission Control in a Computational Market , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[72]  Dongwoo Lee,et al.  An Enhanced Grid Scheduling with Job Priority and Equitable Interval Job Distribution , 2006, GPC.

[73]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008, Concurr. Comput. Pract. Exp..

[74]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[75]  R Buyya Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids , 2004 .

[76]  Lior Amar,et al.  An On-line Algorithm for Fair-Share Node Allocations in a Cluster , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[77]  Shanshan Song,et al.  Selfish grid computing: game-theoretic modeling and NAS performance results , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[78]  Kenneth C. Sevcik,et al.  Implementing Multiprocessor Scheduling Disciplines , 1997, JSSPP.

[79]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[80]  Ivan Beschastnikh,et al.  SPRUCE: A System for Supporting Urgent High-Performance Computing , 2006, Grid-Based Problem Solving Environments.

[81]  Edward Walker,et al.  Creating personal adaptive clusters for managing scientific jobs in a distributed computing environment , 2006, 2006 IEEE Challenges of Large Applications in Distributed Environments.

[82]  Rajkumar Buyya,et al.  Preemption-aware Admission Control in a Virtualized Grid Federation , 2012, 2012 IEEE 26th International Conference on Advanced Information Networking and Applications.

[83]  Dongyan Xu,et al.  VioCluster: Virtualization for Dynamic Computational Domains , 2005, 2005 IEEE International Conference on Cluster Computing.

[84]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[85]  Walfredo Cirne,et al.  Non-Dedicated Distributed Environment: A Solution for Safe and Continuous Exploitation of Idle Cycles , 2005, Scalable Comput. Pract. Exp..

[86]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[87]  Rajkumar Buyya,et al.  Cost of Virtual Machine Live Migration in Clouds: A Performance Evaluation , 2009, CloudCom.

[88]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[89]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[90]  Phil Andrews,et al.  Impact of Reservations on Production Job Scheduling , 2007, JSSPP.

[91]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[92]  David E. Culler,et al.  Wide area cluster monitoring with Ganglia , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[93]  Marvin Solomon,et al.  The evolution of Condor checkpointing , 1999 .

[94]  Rajkumar Buyya,et al.  Resource Provisioning Policies to Increase IaaS Provider's Profit in a Federated Cloud Environment , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[95]  Santiago Ontañón,et al.  Cooperative Case Bartering for Case-Based Reasoning Agents , 2002, CCIA.

[96]  Gerhard J. Woeginger,et al.  Minimizing Makespan and Preemption Costs on a System of Uniform Machines , 2005, Algorithmica.

[97]  Warren Smith,et al.  Scheduling with advanced reservations , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[98]  Prashant J. Shenoy,et al.  Sharc: managing CPU and network bandwidth in shared clusters , 2004, IEEE Transactions on Parallel and Distributed Systems.

[99]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[100]  Rajkumar Buyya,et al.  A Meta-scheduler with Auction Based Resource Allocation for Global Grids , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[101]  Philippe Augerat,et al.  I-Cluster : Intense computing with untapped resources , 2002 .

[102]  Xiaomin Zhu,et al.  From virtualized resources to virtual computing grids: the In-VIGO system , 2005, Future Gener. Comput. Syst..

[103]  Rajkumar Kettimuthu,et al.  Selective preemption strategies for parallel job scheduling , 2002, Proceedings International Conference on Parallel Processing.

[104]  Rajkumar Buyya,et al.  QoS and preemption aware scheduling in federated and virtualized Grid computing environments , 2012, J. Parallel Distributed Comput..

[105]  Andrea C. Arpaci-Dusseau,et al.  Antfarm: Tracking Processes in a Virtual Machine Environment , 2006, USENIX Annual Technical Conference, General Track.

[106]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[107]  Borja Sotomayor,et al.  Resource Leasing and the Art of Suspending Virtual Machines , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[108]  Miron Livny,et al.  Scheduling Mixed Workloads in Multi-grids: The Grid Execution Hierarchy , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.