Fine-Grain Cycle Stealing for Networks of Workstations

Studies have shown that a significant fraction of the time, workstations are idle. In this paper we present a new scheduling policy called Linger-Longer that exploits the fine-grained availability of workstations to run sequential and parallel jobs. We present a two-level workload characterization study and use it to simulate a cluster of workstations running our new policy. We compare two variations of our policy to two previous policies: Immediate- Eviction and Pause-and-Migrate. Our study shows that the Linger-Longer policy can improve the throughput of foreign jobs on cluster by 60% with only a 0.5% slowdown of foreground jobs. For parallel computing, we showed that the Linger-Longer policy outperforms reconfiguration strategies when the processor utilization by the local process is 20% or less in both synthetic bulk synchronous and real data-parallel applications

[1]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.

[2]  Marvin Theimer,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP '85.

[3]  R. Chawla,et al.  The Stealth distributed scheduler , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.

[4]  LivnyMiron,et al.  The available capacity of a privately owned workstation environment , 1991 .

[5]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1996, SIGMETRICS '96.

[6]  Barton P. Miller,et al.  Process migration in DEMOS/MP , 1983, SOSP '83.

[7]  Peter J. Keleher,et al.  The relative importance of concurrent writers and weak consistency models , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[8]  Keith A. Lantz,et al.  Preemptable remote execution facilities for the V-system , 1985, SOSP 1985.

[9]  Edward R. Zayas,et al.  Attacking the process migration bottleneck , 1987, SOSP '87.

[10]  Andrea C. Arpaci-Dusseau,et al.  The interaction of parallel and sequential workloads on a network of workstations , 1995, SIGMETRICS '95/PERFORMANCE '95.

[11]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[12]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[13]  Andrea C. Arpaci-Dusseau,et al.  Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[14]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[15]  Greg Thiel,et al.  LOCUS operating system, a transparent system , 1991, Comput. Commun..

[16]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[17]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[18]  Joel H. Saltz,et al.  The utility of exploiting idle workstations for parallel computation , 1997, SIGMETRICS '97.