Supporting high level programming with high performance: the Illinois Concert system

Programmers of concurrent applications are faced with a complex performance space in which data distribution and concurrency management exacerbate the difficulty of building large, complex applications. To address these challenges, the Illinois Concert system provides a global names-pace, implicit concurrency control and granularity management, implicit storage management, and object oriented programming features. These features are embodied in a language ICC++ (derived from C++) which has been used to build a number of kernels and applications. As high level features can potentially incur overhead, the Concert system employs a range of compiler and runtime optimization techniques to efficiently support the high level programming model. The compiler techniques include type inference, inlining and specialization; and the runtime techniques include caching, prefetching and hybrid stack/heap multithreading. The effectiveness of these techniques permits the construction of complex parallel applications that are flexible, enabling convenient application modification or tuning. We present performance results for a number of application programs which attain good speedups and absolute performance.

[1]  Pierre America,et al.  Pool-T: a parallel object-oriented language , 1987 .

[2]  Andrew A. Chien,et al.  Dynamic pointer alignment: tiling and communication optimizations for parallel pointer-based computations , 1997, PPOPP '97.

[3]  Dennis Gannon,et al.  Object-oriented parallel programming , 1995, International Conference on Software Composition.

[4]  L. Peter Deutsch,et al.  Efficient implementation of the smalltalk-80 system , 1984, POPL.

[5]  Andrew S. Grimshaw,et al.  Easy-to-use object-oriented parallel processing with Mentat , 1993, Computer.

[6]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[7]  Andrew A. Chien,et al.  Optimizing COOP languages: study of a protein dynamics program , 1996, Proceedings of International Conference on Parallel Processing.

[8]  Monica S. Lam,et al.  The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.

[9]  Bjarne Stroustrup,et al.  The Annotated C++ Reference Manual , 1990 .

[10]  Joel H. Saltz,et al.  Parallelizing Molecular Dynamics Programs for Distributed Memory Machines: An Application of the Cha , 1994 .

[11]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[12]  Craig Chambers,et al.  The design and implementation of the self compiler, an optimizing compiler for object-oriented programming languages , 1992 .

[13]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[14]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[15]  Andrew A. Chien,et al.  View Caching: efficient software shared memory for dynamic computations , 1997, Proceedings 11th International Parallel Processing Symposium.

[16]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[17]  Gul Agha,et al.  Efficient Support of Location Transparency in Concurrent Object-Oriented Programming Languages , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[18]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[19]  Andrew A. Chien,et al.  Obtaining sequential efficiency for concurrent object-oriented languages , 1995, POPL '95.

[20]  Dennis Gannon,et al.  Portable parallel programming in HPC++ , 1996, 1996 Proceedings ICPP Workshop on Challenges for Parallel Processing.

[21]  Anoop Gupta,et al.  A parallel adaptive fast multipole method , 1993, Supercomputing '93. Proceedings.

[22]  Gul A. Agha,et al.  HAL: A High-Level Actor Language and Its Distributed Implementation , 1992, ICPP.

[23]  Xingbin Zhang,et al.  A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[24]  Andrew A. Chien,et al.  ICC++—a C++ dialect for high performance parallel computing , 1996, SIAP.

[25]  Julian Dolby Automatic inline allocation of objects , 1997, PLDI '97.

[26]  Satoshi Matsuoka,et al.  StackThreads: An Abstract Machine for Scheduling Fine-Grain Threads on Stock CPUs , 1994, Theory and Practice of Parallel Programming.

[27]  Andrew A. Chien,et al.  ICC++-AC++ Dialect for High Performance Parallel Computing , 1996, ISOTAS.

[28]  David Grove,et al.  Selective specialization for object-oriented languages , 1995, PLDI '95.

[29]  Andrew A. Chien,et al.  Runtime Mechanisms for Efficient Dynamic Multithreading , 1996, J. Parallel Distributed Comput..

[30]  GuptaAnoop,et al.  Parallel Visualization Algorithms , 1994 .

[31]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[32]  Jenq Kuen Lee,et al.  Object oriented parallel programming: experiments and results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[33]  Katherine A. Yelick,et al.  Implementing an irregular application on a distributed memory multiprocessor , 1993, PPOPP '93.

[34]  Jerome A. Feldman,et al.  PSather: Layered Extensions to an Object-Oriented Language for Efficient Parallel Computation , 1993 .

[35]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[36]  Ken Kennedy,et al.  Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines , 1992, ICS '92.

[37]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[38]  Guy L. Steele,et al.  Common Lisp the Language , 1984 .

[39]  Marina C. Chen,et al.  Prototyping Fortran-90 compilers for massively parallel machines , 1992, PLDI '92.

[40]  Marc Levoy,et al.  Parallel visualization algorithms: performance and architectural implications , 1994, Computer.

[41]  Dirk Grunwald,et al.  Quantifying Behavioral Differences Between C and C++ Programs , 1994 .

[42]  Urs Hölzle,et al.  Adaptive optimization for self: reconciling high performance with exploratory programming , 1994 .

[43]  Andrew A. Chien,et al.  Concurrent Aggregates: Supporting Modularity in Massively Parallel Programs , 1993 .

[44]  Andrew A. Chien,et al.  Optimization of object-oriented and concurrent programs , 1996 .

[45]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[46]  Akinori Yonezawa,et al.  Modelling and programming in an object-oriented concurrent language ABCL/1 , 1987 .

[47]  Andrew A. Chien,et al.  Precise concrete type inference for object-oriented languages , 1994, OOPSLA 1994.