The ANL/IBM SP Scheduling System

During the past five years scientists discovered that modern UNIX workstations connected with ethernet and fiber networks could provide enough computational performance to compete with the supercomputers of the day. As this concept became increasingly popular, the need for distributed queuing and scheduling systems became apparent. Systems such as DQS from Florida State were developed and worked very well. Today, supercomputers, such as Argonne National Laboratory's IBM SP system, can provide more CPU and networking speed than can be obtained from these networks of workstations. These modern supercomputers look like clusters of workstations, however, so developers felt that the scheduling systems that were previously used on clusters of workstations should still apply. After trying to apply some of these scheduling systems to Argonne's SP environment, it became obvious that these two computer environments have very different scheduling needs. Recognizing this need and realizing that no one has addressed it, I developed a new scheduling system. The approach taken in creating this system was unique in that user input and interaction were encouraged throughout the development process. Thus, a scheduler was built that actually “worked” the way the users wanted it to work.