New Code Generation Algorithm for QueueCore An Embedded Processor with High ILP

Modern architectures rely on exploiting parallelism found at the instruction level to achieve high performance. Aggressive ILP compilers expose high amounts of instruction level parallelism where, in some cases, the number of architected registers is not enough to hold the results of potential parallel instructions. This paper presents a new code generation scheme for the QueueCore, a 32-bit queue-based architecture capable of executing high amounts of ILP. QueueCore's instructions implicitly read their operands and write results. Compiling for the QueueCore requires that all instructions have at most one explicit operand represented as an offset calculated at compile-time. Additionally, the instructions must be scheduled in level-order manner. The proposed algorithm successfully restricts all instructions to have at most one offset reference, it computes the offset values, and makes a level-order scheduling of the program. To evaluate the effectiveness of the new code generation scheme we developed a queue compiler and compiled a set of benchmark programs. Our results show that the code has more parallelism than optimized RISC code by factors ranging from 1.12 to 2.30. QueueCore's instruction set allows us to generate code about 40%-18% denser than optimized RISC code.

[1]  Javier Zalamea,et al.  Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures , 2004, International Journal of Parallel Programming.

[2]  Tsutomu Yoshinaga,et al.  High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.

[3]  Scott A. Mahlke,et al.  Partitioning variables across register windows to reduce spill code in a low-power processor , 2005, IEEE Transactions on Computers.

[4]  Corporate SPARC architecture manual - version 8 , 1992 .

[5]  Jorge J. Moré,et al.  The NEOS Server , 1998 .

[6]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[7]  Andrew Lewis,et al.  NIMROD/O: A TOOL FOR AUTOMATIC DESIGN OPTIMISATION USING PARALLEL AND DISTRIBUTED SYSTEMS , 2000 .

[8]  Henri Casanova,et al.  Overview of GridRPC: A Remote Procedure Call API for Grid Computing , 2002, GRID.

[9]  Jr. Philip J. Koopman,et al.  Stack computers: the new wave , 1989 .

[10]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[11]  Bruno R. Preiss,et al.  Data flow on a queue machine , 1985, ISCA 1985.

[12]  Satoshi Matsuoka,et al.  Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing , 2003, Journal of Grid Computing.

[13]  Masahiro Sowa,et al.  Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..

[14]  Lenwood S. Heath,et al.  Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..

[15]  Edward S. Davidson,et al.  Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.

[16]  Herman Schmit,et al.  Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[17]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[18]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[19]  Masahiro Sowa,et al.  Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[20]  Tsutomu Yoshinaga,et al.  Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.

[21]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.