Describes the techniques that are used in the CM Fortran 1.0 compiler to map the fine-grained array parallelism of Fortran 90 onto the CM-2 architecture. The compiler views the parallel hardware at a much lower level of detail than did previous CM-2 compilers, which had targeted a function library named Paris. In the slicewise machine model used by CM Fortran 1.0, the FPUs, their registers, and the memory hierarchy are directly exposed to the compiler. Thus, the CM-2 target machine is not 64K simple bit-serial processors. Rather, the target is a machine containing 2K PEs (processing elements), where each PE is both superpipelined and superscalar. The compiler uses data distribution to spread the problem out among the 2K processors. A new compiler phase is used to separate the code that runs on the two types of processors in the CM: the parallel PEs, which execute a new RISC-like instruction set called PEAC, and the scalar front-end processor, which executes SPARC or VAX assembler code. The pipelines in PEs are filled by using conventional vector processing techniques along with a new, RISC-like vector instruction set. An innovative scheduler overlaps the execution of a number of RISC operations. This new compiler has greatly increased the performance of Fortran codes on the CM-2 on many important computation kernels, such as climate modeling, seismic processing, and hydrodynamics simulations.<<ETX>>
[1]
Bruce R. Schatz,et al.
An Overview of the Production-Quality Compiler-Compiler Project
,
1980,
Computer.
[2]
James R. Goodman,et al.
Code scheduling and register allocation in large basic blocks
,
2014,
ICS 25th Anniversary.
[3]
Ahmed Sameh,et al.
The Illiac IV system
,
1972
.
[4]
Robert E. Millstein,et al.
Control structures in Illiac IV Fortran
,
1973,
CACM.
[5]
Guy L. Steele,et al.
Compiling Fortran 8x array features for the connection machine computer system
,
1988,
PPoPP 1988.
[6]
Skef Wholey.
Automatic data mapping for distributed-memory parallel computers
,
1992,
ICS '92.
[7]
Guy L. Steele,et al.
Fortran at ten gigaflops: the connection machine convolution compiler
,
1991,
PLDI '91.
[8]
Guy L. Steele,et al.
Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines
,
1990,
J. Parallel Distributed Comput..
[9]
Steven S. Muchnick,et al.
Efficient instruction scheduling for a pipelined architecture
,
1986,
SIGPLAN '86.
[10]
N. P. Jouppi,et al.
A unified vector/scalar floating-point architecture
,
1989,
ASPLOS 1989.