Livelock avoidance for Meta-schedulers

Meta-scheduling, a process which allows a user to schedule a job across multiple sites, has a potential for livelock. Current systems avoid livelock by locking down resources at multiple sites and allowing a meta-scheduler to control the resources during the lock down period or by limiting job size to that which will fit on one site. The former approach leads to poor utilization; the later poses limitations on job size. This research uses BYU's Meta-scheduler (YMS) which allows jobs to be scheduled across multiple sites without the need for locking down the nodes. YMS avoids livelock through exponential back-off. This research quantifies the potential for livelock, determines a suitable back-off period, and provides a structure upon which to test theoretical local schedulers. The results show that livelock exists and that a suitable exponential back-off not only avoids livelock but reduces the scheduling time for each job.

[1]  Mark J. Clement,et al.  High Performance Computing for the Masses , 1999, IPPS/SPDP Workshops.

[2]  Mark J. Clement,et al.  Parallel Algorithm and Processor Selection Based on Fuzzy Logic , 1999, HPCN Europe.

[3]  Mark J. Clement,et al.  Cost Optimal Analysis for Workstation Clusters , 1996, PDPTA.

[4]  Klara Nahrstedt,et al.  A distributed resource management architecture that supports advance reservations and co-allocation , 1999, 1999 Seventh International Workshop on Quality of Service. IWQoS'99. (Cat. No.98EX354).

[5]  Bruce S. Davie,et al.  Computer Networks: A System Approach , 1998, IEEE Communications Magazine.

[6]  Robert Metcalfe,et al.  Ethernet: distributed packet switching for local computer networks , 1976, CACM.

[7]  Mark J. Clement,et al.  Latency tolerant algorithms for WAN based workstation clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[8]  Mark J. Clement,et al.  The Performance Impact of Advance Reservation Meta-scheduling , 2000, JSSPP.

[9]  Jeffrey C. Mogul,et al.  Measured capacity of an Ethernet: myths and reality , 1988, CCRV.

[10]  Mark J. Clement,et al.  DOGMA: Distributed Object Group Management Architecture , 1998 .

[11]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[12]  Gordon Bell,et al.  Ethernet: Distributed Packet Switching for Local Computer Networks , 1976 .

[13]  William E. Johnston,et al.  QoS as middleware: bandwidth reservation system design , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[14]  Mark J. Clement,et al.  Design issues for efficient implementation of MPI in Java , 1999, JAVA '99.

[15]  Mark J. Clement,et al.  The Round Table ATM Interconnection Network , 1995, PDPTA.