Loop optimization for horizontal microcoded machines

Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism and it will have to take into account the fine level characteristics of the architecture. This paper aims at describing a microcode compiler developed at IRISA for such architectures. After a brief overview of the compilation process, we focus on loop scheduling techniques. The software pipelining algorithm is firstly described. Then a new unrolling-based optimization algorithm is introduced and compared to the classical software pipelining algorithm. This algorithm differs from the traditional loop unrolling algorithm because the unrolling of the loop is only used to find a cyclic scheduling of the loop, then this scheduling allows a software pipelining to be constructed.

[1]  Thomas R. Gross,et al.  Compilation for a high-performance systolic array , 1986, SIGPLAN '86.

[2]  David Johns Dewitt A machine independent approach to the production of optimized horizontal microcode. , 1976 .

[3]  Christine Eisenbeis Optimization of horizontal microcode generation for loop structures , 1988, ICS '88.

[4]  Alexander Aiken,et al.  A development environment for horizontal microcode programs , 1986, MICRO 19.

[5]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[6]  Alexandru Nicolau A Fine-Grain Parallelizing Compiler , 1986 .

[7]  Subrata Dasgupta,et al.  The Identification of Maximal Parallelism in Straight-Line Microprograms , 1976, IEEE Transactions on Computers.

[8]  Bruce D. Shriver,et al.  Local Microcode Compaction Techniques , 1980, CSUR.

[9]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[10]  R. Lathe Phd by thesis , 1988, Nature.

[11]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[12]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[13]  François Charot,et al.  Overview of a high-performance programmable pipeline structure , 1989, ICS '89.

[14]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[15]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[16]  William Jalby,et al.  Squeezing more CPU performance out of a Cray-2 by vector block scheduling , 1988, Proceedings. SUPERCOMPUTING '88.

[17]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[18]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .