Integrated Support for Heterogeneous Parallelism

This paper describes an integrated architecture, compiler, runtime, and operating system solution to exploit heterogeneous parallelism. The architecture is a pipelined multithreaded multiprocessor, enabling the execution of very fine (multiple operations within an instruction) to very coarse (multiple jobs) parallel activities. The compiler and runtime focus on managing parallelism within a job, while the operating system focuses on managing parallelism across jobs. By considering the entire system in the design, we were able to smoothly interface its four components. While each component is primarily responsible for managing its own level of parallel activity, feedback mechanisms between components enable resource allocation and usage to be changed dynamically. This dynamic adaptation to changing requirements and available resources fosters both high utilization of the machine and the efficient expression and execution of parallelism.

[1]  Jack J. Dongarra,et al.  Vectorizing compilers: a test suite and results , 1988, Proceedings. SUPERCOMPUTING '88.

[2]  Arvind,et al.  A critique of multiprocessing von Neumann style , 1983, ISCA '83.

[3]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[4]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[5]  Vivek Sarkar,et al.  Experiences using control dependence in PTRAN , 1990 .

[6]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[7]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[8]  Jean-Loup Baer,et al.  Proceedings of the 39th Annual International Symposium on Computer Architecture , 1983, International Symposium on Computer Architecture.

[9]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[10]  David A. Padua,et al.  Compiler Generated Synchronization for Do Loops , 1986, ICPP.

[11]  Richard R. Oehler,et al.  IBM RISC System/6000 Processor Architecture , 1990, IBM J. Res. Dev..

[12]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[13]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[14]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[15]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[16]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[17]  Edward F. Miller,et al.  A Multiple-Stream Registerless Shared-Resource Processor , 1974, IEEE Transactions on Computers.

[18]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[19]  Burton J. Smith,et al.  The Horizon supercomputing system: architecture and software , 1988, Proceedings. SUPERCOMPUTING '88.

[20]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[21]  W. E Nagel 1988 International conference on supercomputing , 1988 .

[22]  Wilson C. Hsieh,et al.  Automatic generation of DAG parallelism , 1989, PLDI '89.

[23]  Manoj Kumar Effect of storage allocation/reclamation methods on parallelism and storage requirements , 1987, ISCA '87.

[24]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[25]  H. J. Siegel Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, Hawaii, USA, May-June 1988 , 1988, International Symposium on Computer Architecture.

[26]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[27]  Earl E. Swartzlander,et al.  Proceedings of the 1986 International Conference on Parallel Processing/August 19-22, 1986 , 1986 .

[28]  Donald Yeung,et al.  THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[29]  Mordechai Ben-Ari,et al.  Principles of concurrent programming , 1982 .

[30]  Brian N. Bershad,et al.  An Open Environment for Building Parallel Programming Systems , 1988, PPOPP/PPEALS.

[31]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[32]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.