Very long instruction work architectures and the ELI-512

By compiling ordinary scientific applications programs with a radical technique called trace scheduling, we are generating code for a parallel machine that will run these programs faster than an equivalent sequential machine—we expect 10 to 30 times faster. Trace scheduling generates code for machines called Very Long Instruction Word architectures. In Very Long Instruction Word machines, many statically scheduled, tightly coupled, fine-grained operations execute in parallel within a single instruction stream. VLIWs are more parallel extensions of several current architectures. These current architectures have never cracked a fundamental barrier. The speedup they get from parallelism is never more than a factor of 2 to 3. Not that we couldn't build more parallel machines of this type; but until trace scheduling we didn't know how to generate code for them. Trace scheduling finds sufficient parallelism in ordinary code to justify thinking about a highly parallel VLIW. At Yale we are actually building one. Our machine, the ELI-512, has a horizontal instruction word of over 500 bits and will do 10 to 30 RISC-level operations per cycle [Patterson 82]. ELI stands for Enormously Longword Instructions; 512 is the size of the instruction word we hope to achieve. (The current design has a 1200-bit instruction word.) Once it became clear that we could actually compile code for a VLIW machine, some new questions appeared, and answers are presented in this paper. How do we put enough tests in each cycle without making the machine too big? How do we put enough memory references in each cycle without making the machine too slow?

[1]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[2]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[3]  Edward M. Riseman,et al.  Percolation of Code to Enhance Parallel Dispatching and Execution , 1972, IEEE Transactions on Computers.

[4]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[5]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[6]  David A. Patterson,et al.  Towards an efficient, machine-independent language for microprogramming , 1979, MICRO 12.

[7]  Subrata Dasgupta,et al.  The Organization of Microprogram Stores , 1979, CSUR.

[8]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[9]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[10]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[11]  Alexandru Nicolau,et al.  Using an oracle to measure potential parallelism in single instruction stream programs , 1981, MICRO 14.

[12]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[13]  Norman P. Jouppi,et al.  MIPS: A microprocessor architecture , 1982, MICRO 15.

[14]  Dean Jacobs,et al.  Monte Carlo techniques in code optimization , 1982, MICRO 15.

[15]  Carlo H. Séquin,et al.  A VLSI RISC , 1982, Computer.