论文信息 - A Hierarchical Load-Balancing Framework for Dynamic Multithreaded Computations

A Hierarchical Load-Balancing Framework for Dynamic Multithreaded Computations

High-level parallel programming models supporting dynamic fine-grained threads in a global object space, are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to load-balancing. Load-balancing techniques must simultaneously incur low overhead to support fine-grained threads as well as be sophisticated enough to preserve data locality and thread execution priority. This paper presents a hierarchical framework which addresses these conflicting goals by viewing the computation as being made up of different thread subsets, each of which are load-balanced independently. In contrast to previous processor-centric approaches that have advocated the use of a uniform policy for load-balancing all threads in a computation, our framework allows each thread subset to be load-balanced using a policy most suited to its characteristics (e.g., locality or priority sensitivity). The framework consists of two parts: (i) language support which permits a programmer to tag different thread subsets with appropriate policies, and (ii) run-time support which synthesizes overall application load-balance by composing these individual policies. This framework has been implemented in the Illinois Concert runtime system, an execution platform for fine-grained concurrent object-oriented languages. Results for four large irregular applications on the Cray T3D and the SGI Origin 2000 demonstrate advantages of the hierarchical framework: performance improves by up to an order of magnitude as compared to using a uniform load-balancing policy.

Andrew A. Chien | Vijay Karamcheti | A. Chien | V. Karamcheti

[1] M. Berger,et al. Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[2] Jaswinder Pal Singh,et al. Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[3] Andrew S. Grimshaw,et al. Easy-to-use object-oriented parallel processing with Mentat , 1993, Computer.

[4] Anoop Gupta,et al. Data locality and load balancing in COOL , 1993, PPOPP '93.

[5] Andrew A. Chien,et al. Optimizing COOP languages: study of a protein dynamics program , 1996, Proceedings of International Conference on Parallel Processing.

[6] Anne Rogers,et al. Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[7] GuptaAnoop,et al. Parallel Visualization Algorithms , 1994 .

[8] Andrew A. Chien,et al. ICC++-AC++ Dialect for High Performance Parallel Computing , 1996, ISOTAS.

[9] Michael S. Warren,et al. A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.

[10] Katherine A. Yelick,et al. Implementing an irregular application on a distributed memory multiprocessor , 1993, PPOPP '93.

[11] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[12] Hanan Samet,et al. The Design and Analysis of Spatial Data Structures , 1989 .

[13] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[14] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[15] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[16] 米沢明憲. ABCL : an object-oriented concurrent system , 1990 .

[17] Marc Levoy,et al. Parallel visualization algorithms: performance and architectural implications , 1994, Computer.

[18] N. Bose. Multidimensional Systems Theory , 1985 .

[19] Andrew A. Chien,et al. Evaluating high level parallel programming support for irregular applications in ICC++ , 1998 .

[20] William E. Weihl,et al. Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[21] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[22] Vipin Kumar,et al. Scalable parallel formulations of the barnes-hut method for n-body simulations , 1994, Supercomputing '94.

[23] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[24] Robert M. Keller,et al. Simulated Performance of a Reduction-Based Multiprocessor , 1984, Computer.

[25] Ken Kennedy,et al. Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[26] Andrew A. Chien,et al. Run-time techniques for dynamic multithreaded computations , 1998 .

[27] B. Buchberger. An Algorithmic Method in Polynomial Ideal Theory , 1985 .

[28] Brian N. Bershad,et al. PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[29] Laxmikant V. Kalé,et al. Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.

[30] Harrick M. Vin,et al. A hierarchial CPU scheduler for multimedia operating systems , 1996, OSDI '96.

[31] Andrew A. Chien,et al. Evaluating high level parallel programming support for irregular applications in ICC++ , 1998, Softw. Pract. Exp..

[32] Ian T. Foster,et al. The Nexus Approach to Integrating Multithreading and Communication , 1996, J. Parallel Distributed Comput..

[33] Andrew A. Chien,et al. Supporting high level programming with high performance: the Illinois Concert system , 1997, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[34] Robert H. Halstead,et al. Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[35] A ChienAndrew. ICC++a C++ dialect for high performance parallel computing , 1996 .

[36] J.A. Jones,et al. Parallelizing the Phylogeny Problem , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[37] Matthew Haines,et al. On the design of Chant: a talking threads package , 1994, Proceedings of Supercomputing '94.

[38] Scott Pakin,et al. Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.