Load-sharing in heterogeneous systems via weighted factoring

Jeanette Schmidt~ R. N. Uma$ Joel Wein~ We consider the problem of scheduling a parallel loop with independent iterations on a network of heterogeneous workstations, and demonstrate the effectiveness of a variant of fa.toring, a scheduling policy originating in the context of shared address-space homogeneous multiprocessors. In the new scheme, weighted factoring, processors are dynamically assigned decreasing size chunks of iterations in proportion to their processing speeds. Through experiments on a network of SUN Spare workstations we show that weighted factoring significantly outperforms variants of a work-stea!ing load-balancing algorithm and on certain applications dramatically outperforms factoring as well. We then study weighted work assignment analytically, giving upper and lower bounds on its performance under the assumption that the processor iteration execution times can be modeled as weighted random variables. *Department of Computer Science,Polytechmc Umverslty, Brooklyn, NY, 11201. Researchsupported by ARPA/USAF under Grant no F30602-95-1-OO08and the New York State Science and Technology Foundation through Its center for Advanced Technology in Telecommunications Joel Wein wassupported in part by NSF Grant CCR-9211494, and Jeanette Schmidt m part by NSF grant CCR9305873. thummelQmono poly edu (Contact Author) *JpsC!qmcs4 poly.edu $ ruma@photon.poly .edu ~ wein@mem poly. edu. Permissionto makedigitallhard copiesof all or pastof thk material for personalor classroomuseis grantedwithout fee provided that the copies are not madeor dktributed for profit or commercialadvantage,the copyright notice, the title of the publication and its dateappear,and notice is given that copyright is by permissionof the ACM, Inc. To copy otherwise, to republish, to post on serversor to redistributeto lists, requiresspecific riersnissionand/or fee.

[1]  Document for a Standard Message-Passing Interface , 1993 .

[2]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[3]  F. Warren Burton,et al.  Executing functional programs on a virtual tree of processors , 1981, FPCA '81.

[4]  Edward D. Lazowska,et al.  Adaptive load sharing in homogeneous distributed systems , 1986, IEEE Transactions on Software Engineering.

[5]  G. A. Mack,et al.  Order Statistics (2nd Ed.) , 1983 .

[6]  Peter Steenkiste,et al.  A general architecture for load balancing in a distributed-memory environment , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[7]  Rishiyur S. Nikhil,et al.  Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines , 1994, LCPC.

[8]  Richard M. Karp,et al.  Randomized parallel algorithms for backtrack search and branch-and-bound computation , 1993, JACM.

[9]  Eli Upfal,et al.  A simple load balancing scheme for task allocation in parallel machines , 1991, SPAA '91.

[10]  Ioana Banicescu,et al.  Load Balancing and Data Locality Via Fractiling: An Experimental Study , 1996 .

[11]  Mukesh Singhal,et al.  Load distributing for locally distributed systems , 1992, Computer.

[12]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[13]  Yung-Terng Wang,et al.  Load Sharing in Distributed Systems , 1985, IEEE Transactions on Computers.

[14]  Edith Schonberg,et al.  Factoring: a method for scheduling parallel loops , 1992 .

[15]  E. J. Gumbel,et al.  The Maxima of the Mean Largest Value and of the Range , 1954 .

[16]  Edith Schonberg,et al.  Factoring: a practical and robust method for scheduling parallel loops , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[17]  H. A. David,et al.  Order Statistics (2nd ed). , 1981 .

[18]  Thomas L. Casavant,et al.  A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems , 1988, IEEE Trans. Software Eng..

[19]  H. O. Hartley,et al.  Universal Bounds for Mean Range and Extreme Observation , 1954 .

[20]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[21]  Adam W. Bojanczyk,et al.  Parallel algorithms for space-time adaptive processing , 1995, Proceedings of 9th International Parallel Processing Symposium.

[22]  John L. Hennessy Architectural convergence and its implications , 1994 .

[23]  I. Banicescu,et al.  Balancing Processor Loads and Exploiting Data Locality in N-Body Simulations , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[24]  Steven Lucco,et al.  A dynamic scheduling method for irregular parallel programs , 1992, PLDI '92.