This paper discusses software pipelining for a new class of architectures that we call transport-triggered. These architectures reduce the interconnection requirements between function units. They also exhibit code scheduling possibilities which are not available in traditional operation-triggered architectures. In addition the scheduling freedom is extended by the use of so-called hybridpipelined function utits. In order to exploit this tleedom, existing scheduling techniques need to be extended. We present a software pipelirtirtg technique, based on Lam’s algorithm, which exploits the potential of !mnsport-triggered architectures. Performance results are presented for several benchmak loops. Depending on the available transport capacity, MFLOP rates may increase significantly as compared to scheduling without the ex~a degrees of freedom. As stated in [5] transport-triggered MOVE architectures have extra irtstxuction scheduling degrees of tkeedom. This paper investigates if and how those extra degrees influence the software pipelining iteration initiation interval. It therefore adapts the existing algorithms for software pipelining as developed by Lam [2]. It is shown that transport-triggering may lead to a significant reduction of the iteration initiation interval and therefore to an increase of the MIPS and/or MFLOPS rate. The remainder of this paper starts with an introduction of the MOVE class of architectures; it clari6es the idea of transporttriggered architectures. Section 3 formulates the software pipelining problem and its algorithmic solution for trrmsport-triggered architectures. Section 4 describes the architecture characteristics and benchmarks used for the measurements. In order to research the influence of the extra scheduling freedom, the algorithm has been applied to the benchmarks under dfierent scheduling disciplines. The next section (5) compares and analysis the measurements. Finally section 6 gives severaf conclusions and indicates further research to be done.
[1]
Toshio Nakatani,et al.
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture
,
1990
.
[2]
Alexander Aiken,et al.
Optimal loop parallelization
,
1988,
PLDI '88.
[3]
Elliott Irving Organick,et al.
Interpreting machines: Architecture and programming of the B1700/B1800 series (Operating and programming systems series)
,
1978
.
[4]
B. Ramakrishna Rau,et al.
The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs
,
1989,
Computer.
[5]
Henk Corporaal,et al.
MOVE: a framework for high-performance processor design
,
1991,
Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[6]
Jing Wang,et al.
Loop-carried dependence and the general URPR software pipelining approach (unrolling, pipelining and rerolling)
,
1991,
Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.
[7]
Monica Sin-Ling Lam,et al.
A Systolic Array Optimizing Compiler
,
1989
.