Performance and Productivity in Parallel Programming via Processor Virtualization

We have been pursuing a research program aimed at enhancing productivity and performance in parallel computing at the Parallel Programming Laboratory of University of Illinois for the past decade. We summarize the basic approach, and why it has improved (and will further improve) both productivity and performance. The centerpiece of our approach is a technique called processor virtualization: the program computation is divided into a large number of chunks (called virtual processors), which are mapped to processors by an adaptive, intelligent runtime system. The runtime system also controls communication between virtual processors. This approach makes possible a number of runtime optimizations. We argue that the following strategies are necessary to improve productivity in parallel programming: • Automated resource management via processor virtualization • Modularity via concurrent composability • Reusability via frameworks, libraries, and multiparadigm interoperability Of these, the first two directly benefit from processor virtualization, while the last is indirectly impacted. We describe our research on all these fronts.

[1]  V. A. Saletore A Distributed and Adaptive Dynamic Load Balancing Scheme for Parallel Processing of Medium-Grain Tasks , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[2]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[3]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  Alexander V. Veidenbaum,et al.  The effect of restructing compilers on program performance for high-speed computers☆ , 1985 .

[5]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[6]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[7]  Laxmikant V. Kalé,et al.  Supporting dynamic parallel object arrays , 2001, JGI '01.

[8]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[9]  Laxmikant V. Kalé,et al.  Scaling Molecular Dynamics to 3000 Processors with Projections: A Performance Analysis Case Study , 2003, International Conference on Computational Science.

[10]  Scott B. Baden,et al.  Flexible Communication Mechanisms for Dynamic Structured Applications , 1996, IRREGULAR.

[11]  Laxmikant V. Kalé,et al.  Prioritization in Parallel Symbolic Computing , 1992, Parallel Symbolic Computing.

[12]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[13]  Robert J. Harrison,et al.  Global arrays: A nonuniform memory access programming model for high-performance computers , 1996, The Journal of Supercomputing.

[14]  Laxmikant V. Kale,et al.  Object-Based Adaptive Load Balancing for MPI Programs∗ , 2000 .

[15]  Laxmikant V. Kale Application oriented and computer science centered HPCC research , 1994 .

[16]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[17]  Laxmikant V. Kalé,et al.  Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.

[18]  Laxmikant V. Kalé,et al.  A New Approach to Software Integration Frameworks for Multi-physics Simulation Codes , 2000, The Architecture of Scientific Software.

[19]  John R. Rice,et al.  //ELLPACK: a numerical simulation programming environment for parallel MIMD machines , 1990, ICS '90.

[20]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[21]  Guang R. Gao,et al.  HTMT program execution model , 2002 .

[22]  Robert Olson,et al.  Nexus: An interoperability layer for parallel and distributed computer systems , 1994 .

[23]  Scott R. Kohn,et al.  Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.

[24]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[25]  Laxmikant V. Kalé,et al.  Supporting dynamic parallel object arrays , 2003, Concurr. Comput. Pract. Exp..

[26]  Laxmikant V. Kalé,et al.  The Chare Kernel Parallel Programming Language and System , 1990, ICPP.

[27]  Laxmikant V. Kale,et al.  Charisma: A Component Architecture for Parallel Programming , 2002 .

[28]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[29]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[30]  José E. Moreira,et al.  Dynamic resource management on distributed systems using reconfigurable applications , 1997, IBM J. Res. Dev..

[31]  Laxmikant V. Kalé,et al.  Jade: A Parallel Message-Driven Java , 2003, International Conference on Computational Science.

[32]  Laxmikant V. Kalé,et al.  A Parallel Framework for Explicit FEM , 2000, HiPC.

[33]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.