New Code Generation Algorithm for QueueCore An Embedded Processor with High ILP

Modern architectures rely on exploiting parallelism found at the instruction level to achieve high performance. Aggressive ILP compilers expose high amounts of instruction level parallelism where, in some cases, the number of architected registers is not enough to hold the results of potential parallel instructions. This paper presents a new code generation scheme for the QueueCore, a 32-bit queue-based architecture capable of executing high amounts of ILP. QueueCore's instructions implicitly read their operands and write results. Compiling for the QueueCore requires that all instructions have at most one explicit operand represented as an offset calculated at compile-time. Additionally, the instructions must be scheduled in level-order manner. The proposed algorithm successfully restricts all instructions to have at most one offset reference, it computes the offset values, and makes a level-order scheduling of the program. To evaluate the effectiveness of the new code generation scheme we developed a queue compiler and compiled a set of benchmark programs. Our results show that the code has more parallelism than optimized RISC code by factors ranging from 1.12 to 2.30. QueueCore's instruction set allows us to generate code about 40%-18% denser than optimized RISC code.

[1]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[2]  Edward S. Davidson,et al.  Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.

[3]  Javier Zalamea,et al.  Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures , 2004, International Journal of Parallel Programming.

[4]  Masahiro Sowa,et al.  Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..

[5]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[6]  Scott A. Mahlke,et al.  Partitioning variables across register windows to reduce spill code in a low-power processor , 2005, IEEE Transactions on Computers.

[7]  Jr. Philip J. Koopman,et al.  Stack computers: the new wave , 1989 .

[8]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[9]  Bruno R. Preiss,et al.  Data flow on a queue machine , 1985, ISCA 1985.

[10]  Tsutomu Yoshinaga,et al.  Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.

[11]  Masahiro Sowa,et al.  Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[12]  Herman Schmit,et al.  Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[13]  J. Llosa,et al.  Using Queues for Register File Organization in VLIW Architectures by Marcio , 1997 .

[14]  Jason Merrill Generic and gimple: A new tree represen-tation for entire functions , 2003 .

[15]  Barry J. Epstein,et al.  The Sparc Architecture Manual/Version 8 , 1992 .

[16]  Trevor Mudge,et al.  The Need for Large Register Files in Integer Codes , 2000 .

[17]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[18]  Lenwood S. Heath,et al.  Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..

[19]  Tsutomu Yoshinaga,et al.  High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.

[20]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .