A case for conservative workload modeling: Parallel job scheduling with daily cycles of activity

Computer workloads have many attributes. When modeling these workloads it is often difficult to decide which attributes are important, and which can be abstracted away. In many cases, the modeler only includes attributes that are believed to be important, and ignores the rest. We argue, however, that this can lead to impaired workloads and unreliable system evaluations. Using parallel job scheduling as a case study, and daily cycles of activity as the attribute in dispute, we present two schedulers whose simulated performance seems identical without cycles, but then becomes significantly different when daily cycles are included in the workload. We trace this to the ability of one scheduler to prioritize interactive jobs, which leads to implicitly delaying less critical work to nighttime, when it can utilize resources that otherwise would have been left idle. Notably, this was not a design feature of this scheduler, but rather an emergent property that was not anticipated in advance.

[1]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[2]  David Talby,et al.  What is worth learning from parallel workloads?: a user and session based analysis , 2005, ICS '05.

[3]  Dror G. Feitelson,et al.  Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[4]  Dror G. Feitelson,et al.  Backfilling with lookahead to optimize the packing of parallel jobs , 2005, J. Parallel Distributed Comput..

[5]  Dror G. Feitelson,et al.  On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing? , 2009, IEEE Transactions on Parallel and Distributed Systems.

[6]  Evgenia Smirni,et al.  Multiple-queue backfilling scheduling with priorities and reservations for parallel systems , 2002, PERV.

[7]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[8]  Adam Wierman,et al.  Open Versus Closed: A Cautionary Tale , 2006, NSDI.

[9]  Julian Lorenz,et al.  Bayesian Adaptive Trading with a Daily Cycle , 2006 .

[10]  Saleem N. Bhatti,et al.  Modelling user behaviour in networked games , 2001, MULTIMEDIA '01.

[11]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[12]  Dror G. Feitelson,et al.  Uncovering the Effect of System Performance on User Behavior from Traces of Parallel Systems , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[13]  James Patton Jones,et al.  Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization , 1999, JSSPP.

[14]  John E. West,et al.  Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy , 2002, JSSPP.

[15]  Rajkumar Buyya,et al.  Model-Driven Simulation of Grid Scheduling Strategies , 2007, Third IEEE International Conference on e-Science and Grid Computing (e-Science 2007).

[16]  Hui Li,et al.  Workload Characteristics of a Multi-cluster Supercomputer , 2004, JSSPP.

[17]  Dror G. Feitelson,et al.  Using Site-Level Modeling to Evaluate the Performance of Parallel System Schedulers , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[18]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[19]  Dror G. Feitelson Experimental analysis of the root causes of performance evaluation results: a backfilling case study , 2005, IEEE Transactions on Parallel and Distributed Systems.

[20]  P. Sadayappan,et al.  Selective Reservation Strategies for Backfill Job Scheduling , 2002, JSSPP.

[21]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..