When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request

In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e., there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance by automatically selecting how many processors to use. Since most jobs are moldable, this result has great impact to the current state of practice in supercomputer scheduling. However, the widespread use of SA can change the nature of workload processed by supercomputers. When many SAs are scheduling jobs on one supercomputer, the decision made by one SA affects the state of the system, therefore impacting other instances of SA. In this case, the global behavior of the system comes from the aggregate behavior caused by all SAs. In particular, it is reasonable to expect the competition for resources to become tougher with multiple SAs, and this tough competition to decrease the performance improvement attained by each SA individually. This paper investigates this very issue. We found that the increased competition indeed makes it harder for each individual instance of SA to improve job performance. Nevertheless, there are two other aggregate behaviors that override increased competition when the system load is moderate to heavy. First, as load goes up, SA chooses smaller requests, which increases efficiency, which effectively decreases the offered load, which mitigates long wait times. Second, better job packing and fewer jobs in the system make it easier for incoming jobs to fit in the supercomputer schedule, thus reducing wait times further. As a result, in moderate to heavy load conditions, a single instance of SA benefits from the fact that other jobs are also using SA.

[1]  Kenneth C. Sevcik Characterizations of parallelism in applications and their use in scheduling , 1989, SIGMETRICS '89.

[2]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[3]  Tad Hogg,et al.  Controlling chaos in distributed systems , 1991, IEEE Trans. Syst. Man Cybern..

[4]  John Zahorjan,et al.  Zahorjan processor allocation policies for message-passing parallel computers , 1994, SIGMETRICS 1994.

[5]  Francine Berman,et al.  A model for moldable supercomputer jobs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[6]  Francine Berman,et al.  Using Moldability to Improve the Performance of Supercomputer Jobs , 2002, J. Parallel Distributed Comput..

[7]  Mary K. Vernon,et al.  Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies , 1994, SIGMETRICS.

[8]  Michael Mitzenmacher,et al.  How Useful Is Old Information? , 2000, IEEE Trans. Parallel Distributed Syst..

[9]  Dror G. Feitelson,et al.  Utilization and Predictability in Scheduling the IBM SP2 with Backfilling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[10]  Francine Berman,et al.  A comprehensive model of the supercomputer workload , 2001 .

[11]  Stergios V. Anastasiadis,et al.  Parallel Application Scheduling on Networks of Workstations , 1997, J. Parallel Distributed Comput..

[12]  Ramesh Krishnamurti,et al.  The Processor Partitioning Problem In Special-Purpose Partitionable Systems , 1994, ICPP.

[13]  Michael Mitzenmacher,et al.  How useful is old information (extended abstract)? , 1997, PODC '97.

[14]  Thomas R. Gross,et al.  Impact of Job Mix on Optimizations for Space Sharing Schedulers , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[15]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[16]  Giuseppe Serazzi,et al.  Robust Partitioning Policies of Multiprocessor Systems , 1994, Perform. Evaluation.

[17]  Allen B. Downey,et al.  Using Queue Time Predictions for Processor Allocation , 1997, JSSPP.

[18]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[19]  Satish K. Tripathi,et al.  The Processor Working Set and Its Use in Scheduling Multiprocessor Systems , 1991, IEEE Trans. Software Eng..

[20]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[21]  Kento Aida,et al.  Job Scheduling Scheme for Pure Space Sharing Among Rigid Jobs , 1998, JSSPP.

[22]  Jens Mache,et al.  A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling , 1998, JSSPP.

[23]  Allen B. Downey,et al.  A Model For Speedup of Parallel Programs , 1997 .

[24]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[25]  Kenneth C. Sevcik,et al.  Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[26]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[27]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[28]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[29]  Kento Aida Effect of Job Size Characteristics on Job Scheduling Performance , 2000, JSSPP.

[30]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[31]  Kenneth C. Sevcik,et al.  Benefits of Speedup Knowledge in Memory-Constrained Multiprocessor Scheduling , 1996, Perform. Evaluation.

[32]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.