An extensible job scheduling system for massively parallel processor architectures

During the last five years scientists have discovered that modern UNIX workstations connected with ethernet and high performance networks can provide enough computational performance to compete with the supercomputers of the day. Today supercomputer systems, like International Business Machines SP, can provide more CPU and networking bandwidth than is obtainable from networks of workstations, NOWs. The IBM SP is actually made up of individual workstation class processors connected by a high bandwidth switch network so scheduler developers felt that the scheduling systems that were previously used on NOWs would still apply. It became obvious to the many sites who purchased MPP systems that this was certainly not the case. Realizing that there was an urgent need for a job scheduling system that works well in an MPP environment I started the development of the Extensible Argonne Scheduling sYstem (EASY). A unique development approach in which users were encouraged to make suggestions or report inconsistencies with the documented behavior of EASY was used. As EASY became more widely used and IBM SP systems started to become much larger, several scalability problems had to be addressed. The main scalability problem was due to the fact that EASY was doing resource management as well as job scheduling. After discussions with IBM on how to address these issues, we decided on an application programming interface to their LoadLeveler product. LoadLeveler did a poor job of scheduling large parallel jobs but did provide scaleable resource management. The EASY-LoadLeveler project was started to combine the best features of both scheduling systems. In order to test the scalability of EASY-LoadLeveler a much larger system than I had access to at Argonne was necessary. The Cornell Theory Center provided not only the largest IBM SP in existence at the time, but also a very close working relationship with IBM, making it an ideal place to do this research. Within a year we had the first version up and running at the Theory Center. This work has been extended to include a new deterministic heterogeneous scheduling algorithm that can be used on systems that have nodes with different resources.

[1]  Ward Rosenberry,et al.  Understanding DCE , 1992 .

[2]  Kenneth C. Sevcik,et al.  Multiprocessor Scheduling for High-Variability Service Time Distributions , 1995, JSSPP.

[3]  Larry Rudolph,et al.  Parallel Job Scheduling: Issues and Approaches , 1995, JSSPP.

[4]  Samuel T. Chanson,et al.  A hydro-dynamic approach to heterogeneous dynamic load balancing in a network of computers , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[5]  Mark S. Squillante,et al.  On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems , 1995, JSSPP.

[6]  Mary K. Vernon,et al.  Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies , 1994, SIGMETRICS.

[7]  John Zahorjan,et al.  Zahorjan processor allocation policies for message-passing parallel computers , 1994, SIGMETRICS 1994.

[8]  Matt Bishop,et al.  Process Migration for Heterogeneous Distributed Systems , 1995 .

[9]  Oscar Kipersztok,et al.  Intelligent Fuzzy Control to Augment Scheduling Capabilities of Network Queueing Systems , 1995, JSSPP.

[10]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, SIGMOD Conference.

[11]  Larry Wall,et al.  Learning Perl , 1993 .

[12]  Xiaolei Qian Semantic interoperation via intelligent mediation , 1993, Proceedings RIDE-IMS `93: Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems.

[13]  Larry Wall,et al.  Programming Perl , 1991 .

[14]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[15]  William Gropp,et al.  A Test Implementation of the MPI Draft Message-Passing Standard , 1992 .

[16]  Mario J. Gonzalez Deterministic Processor Scheduling , 1977, CSUR.

[17]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[18]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[19]  Giuseppe Serazzi,et al.  Analysis of Non-Work-Conserving Processor Partitioning Policies , 1995, JSSPP.

[20]  Carl Lagoze,et al.  A Design for Inter-Operable Secure Object Stores (ISOS) , 1995 .

[21]  E. Drexler,et al.  Incentive engineering for computational resource management , 1988 .

[22]  Hussein M. Abdel-Wahab,et al.  A Microeconomic Scheduler for Parallel Computers , 1995, JSSPP.

[23]  Satish K. Tripathi,et al.  Analysis of Processor Allocation in Multiprogrammed, Distributed-Memory Parallel Processing Systems , 1994, IEEE Trans. Parallel Distributed Syst..

[24]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[25]  Katherine Yelick,et al.  Multipol: A Distributed Data Structure Library , 1995 .

[26]  Thomas J. Mowbray,et al.  The essential CORBA - systems integration using distributed objects , 1995 .

[27]  Miron Livny,et al.  Load Balancing in Homogeneous Broadcast Distributed Systems , 1982, SIGMETRICS.

[28]  K. Eric Drexler,et al.  Markets and computation: agoric open systems , 1988 .

[29]  William Gropp,et al.  Dynamic process management in an MPI setting , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[30]  Robert L. Henderson,et al.  Job Scheduling Under the Portable Batch System , 1995, JSSPP.

[31]  Ewing Lusk,et al.  User''s Guide to the p4 Parallel Programming System , 1992 .

[32]  Robert E. Tarjan,et al.  Performance Bounds for Level-Oriented Two-Dimensional Packing Algorithms , 1980, SIAM J. Comput..

[33]  William Gropp,et al.  Scalable Unix tools on parallel processors , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[34]  David J. Schneider,et al.  Requirements of the Cornell Theory Center for Resource Management and Process Scheduling , 1995, JSSPP.

[35]  Aidong Zhang,et al.  Ensuring Semi-Atomicity in Heterogeneous Distributed Database Systems , 1994 .

[36]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[37]  Miron Livny,et al.  Parallel Processing on Dynamic Resources with CARMI , 1995, JSSPP.

[38]  Tad Hogg,et al.  Spawn: A Distributed Computational Economy , 1992, IEEE Trans. Software Eng..

[39]  Carl Lagoze,et al.  A Secure Repository Design for Digital Libraries , 1995, D-Lib Magazine.