Automating runtime optimizations for parallel object-oriented programming

Software development for parallel computers has been recognized as one of the bottlenecks preventing their widespread use. In this thesis we examine two complementary approaches for addressing the challenges of high performance and enhanced programmability in parallel programs: automated optimizations and object-orientation. We have developed the parallel object-oriented language Charm++ (an extension of C++), which enables the benefits of object-orientation to be applied to the problems of parallel programming. In order to improve parallel program performance without extra effort, we explore the use of automated optimizations. In particular, we have developed techniques for automating run-time optimizations for parallel object-oriented languages. These techniques have been embodied in the Paradise post-mortem analysis tool which automates several run-time optimizations without programmer intervention. Paradise builds a program representation from traces, analyzes characteristics, chooses and parameterizes optimizations, and generates hints to the Charm++ run-time libraries. The optimizations researched are for static and dynamic object placement, scheduling, granularity control and communication reduction. We also evaluate Charm++, Paradise and several run-time optimization techniques using real applications, including an N-body simulation program, a program from the NAS benchmark suite, and several other programs.

[1]  Gul Agha,et al.  Efficient Support of Location Transparency in Concurrent Object-Oriented Programming Languages , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[2]  Andrew W. Appel,et al.  An Efficient Program for Many-Body Simulation , 1983 .

[3]  Joel H. Saltz,et al.  Run-time and compile-time support for adaptive irregular problems , 1994, Proceedings of Supercomputing '94.

[4]  Ken Kennedy,et al.  An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, SC.

[5]  Bernd Mohr,et al.  TAU: A Portable Parallel Program Analysis Environment for pC++ , 1994, CONPAR.

[6]  L. Greengard The Rapid Evaluation of Potential Fields in Particle Systems , 1988 .

[7]  Michael S. Warren,et al.  Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.

[8]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[9]  Amitabh Sinha Performance analysis of object-based and message-driven programs , 1995 .

[10]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[11]  Gregory V. Wilson,et al.  Parallel Programming Using C , 1996 .

[12]  Vijay K. Naik,et al.  Parallelization of a Class of Implicit Finite Difference Schemes in Computational Fluid Dynamics , 1993, Int. J. High Speed Comput..

[13]  Pangfeng Liu,et al.  Experiences with parallel N-body simulation , 1994, SPAA '94.

[14]  Winifred Williams,et al.  The MPP Apprentice™ Performance Tool: Delivering the Performance of the Cray T3D® , 1994 .

[15]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[16]  Andrew A. Chien,et al.  The concert system--compiler and runtime support for efficient, fine-grained concurrent object-oriented programs , 1993 .

[17]  Laxmikant V. Kalé,et al.  Automating parallel runtime optimizations using post-mortem analysis , 1996, ICS '96.

[18]  Laxmikant V. Kalé,et al.  Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.

[19]  William Gropp,et al.  A Parallel Version of the Fast Multipole Method-Invited Talk , 1987, PPSC.

[20]  Attila Gursoy,et al.  Simplified expression of message-driven programs and quantification of their impact on performance , 1994 .

[21]  Andrew A. Chien,et al.  Concurrent Aggregates: Supporting Modularity in Massively Parallel Programs , 1993 .

[22]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[23]  Nancy M. Amato,et al.  Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.

[24]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[25]  Barton P. Miller,et al.  Mapping performance data for high-level and data views of parallel program performance , 1996, ICS '96.

[26]  Manish Gupta,et al.  PARADIGM: a compiler for automatic data distribution on multicomputers , 1993, ICS '93.

[27]  Laxmikant V. Kalé,et al.  Simulating message-driven programs , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.

[28]  Barbara M. Chapman,et al.  Automatic Support for Data Distribution on Distributed Memory Multiprocessor Systems , 1993, LCPC.

[29]  Rob F. Van der Wijngaart Efficient implementation of a 3-dimensional ADI method on the iPSC/860 , 1993, SC.

[30]  William C. Athas,et al.  Cantor: an actor programming system for scientific computing , 1989, ACM SIGPLAN Notices.

[31]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[32]  Ken Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, SC.

[33]  Anoop Gupta,et al.  COOL: An object-based language for parallel programming , 1994, Computer.

[34]  John Kohn,et al.  ATExpert , 1993, J. Parallel Distributed Comput..

[35]  J.P. Singh Implications of Hierarchical N-body Methods for Multiprocessor Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[36]  Scott B. Baden,et al.  Irregular Coarse-Grain Data Parallelism under LPARX , 1996, Sci. Program..

[37]  Laxmikant V. Kalé,et al.  Supporting Machine Independent Programming on Diverse Parallel Architectures , 1991, ICPP.

[38]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[39]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[40]  Prithviraj Banerjee,et al.  Exploiting spatial regularity in irregular iterative applications , 1995, Proceedings of 9th International Parallel Processing Symposium.

[41]  J.A. Board,et al.  Scalable implementations of multipole-accelerated algorithms for molecular dynamics , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[42]  Laxmikant V. Kale,et al.  The Charm Parallel Programming Language and System: Part I - Description of Language Features , 1994 .

[43]  J. CARRIERt,et al.  A FAST ADAPTIVE MULTIPOLE ALGORITHM FOR PARTICLE SIMULATIONS * , 2022 .

[44]  Wen-mei W. Hwu,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[45]  Laxmikant V. Kalé,et al.  A load balancing strategy for prioritized execution of tasks , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[46]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[47]  Laxmikant V. Kale,et al.  A tutorial introduction to charm , 1992 .

[48]  Pankaj Mehra,et al.  Performance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit , 1995, Softw. Pract. Exp..

[49]  Harry Berryman,et al.  Runtime Compilation Methods for Multicomputers , 1991, International Conference on Parallel Processing.

[50]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[51]  Gul A. Agha,et al.  HAL: A High-Level Actor Language and Its Distributed Implementation , 1992, ICPP.

[52]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[53]  Michael T. Heath,et al.  Visualizing the performance of parallel programs , 1991, IEEE Software.

[54]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[55]  Laxmikant V. Kalé,et al.  A Comparison Based Parallel Sorting Algorithm , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[56]  Helmut Grubmüller,et al.  Molecular dynamics simulation on a parallel computer. , 1990 .

[57]  Anoop Gupta,et al.  A parallel adaptive fast multipole method , 1993, Supercomputing '93. Proceedings.

[58]  Min-You Wu,et al.  A Dynamic Partitioning Strategy on Distributed Memory Systems , 1990, ICPP.

[59]  Laxmikant V. Kalé,et al.  A Parallel Adaptive Fast Multipole Algorithm forn-Body Problems , 1995, ICPP.

[60]  Thomas Fahringer Estimating and Optimizing Performance for Parallel Programs , 1995, Computer.

[61]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[62]  Sanjay Ranka,et al.  Parallel incremental graph partitioning using linear programming , 1994, Proceedings of Supercomputing '94.

[63]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[64]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[65]  P. R. Cappello,et al.  Implementing the beam and warming method on the hypercube , 1989, C3P.

[66]  Ananth Grama,et al.  Scalable parallel formulations of the Barnes-Hut method for n-body simulations , 1994, Proceedings of Supercomputing '94.

[67]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[68]  Laxmikant V. Kalé,et al.  The Chare Kernel Parallel Programming Language and System , 1990, ICPP.

[69]  Andrew S. Grimshaw,et al.  Easy-to-use object-oriented parallel processing with Mentat , 1993, Computer.

[70]  Josep Torrellas,et al.  An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.

[71]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.