Compiler Support for Code Size Reduction Using a Queue-Based Processor

Queue computing delivers an attractive alternative for embedded systems. The main features of a queue-based processor are a dense instruction set, high-parallelism capabilities, and low hardware complexity. This paper presents the design of a code generation algorithm implemented in the queue compiler infrastructure to achieve high code density by using a queue-based instruction set processor. We present the efficiency of our code generation technique by comparing the code size and extracted parallelism for a set of embedded applications against a set of conventional embedded processors. The compiled code is, in average, 12.03% more compact than MIPS16 code, and 45.1% more compact than ARM/Thumb code. In addition, we show that the queue compiler, without optimizations, can deliver about 1.16 times more parallelism than fully optimized code for a register machine.

[1]  Rajiv Gupta,et al.  Profile guided selection of ARM and thumb instructions , 2002, LCTES/SCOPES '02.

[2]  Kurt Keutzer,et al.  Code density optimization for embedded DSP processors using data compression techniques , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.

[3]  Diego Novillo Design and Implementation of Tree SSA , 2004 .

[4]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Donald B. Alpert,et al.  Architecture of the Pentium microprocessor , 1993, IEEE Micro.

[6]  Tsutomu Yoshinaga,et al.  High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.

[7]  Masahiro Sowa,et al.  Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[8]  Rajiv Gupta,et al.  Enhancing the performance of 16-bit code using augmenting instructions , 2003, LCTES.

[9]  Wen-mei W. Hwu,et al.  Proceedings of the 25th annual international symposium on Microarchitecture , 1992, MICRO.

[10]  Kevin D. Kissell MIPS16: High-density MIPS for the Embedded Market1 , 1997 .

[11]  Herman Schmit,et al.  Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[12]  J. Llosa,et al.  Using Queues for Register File Organization in VLIW Architectures by Marcio , 1997 .

[13]  Steven E. Shladover Research and development needs for advanced vehicle control systems , 1993, IEEE Micro.

[14]  Frank Vahid,et al.  Tiny instruction caches for low power embedded systems , 2003, TECS.

[15]  A. Wolfe,et al.  Executing Compressed Programs On An Embedded RISC Architecture , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[16]  Jason Merrill Generic and gimple: A new tree represen-tation for entire functions , 2003 .

[17]  Saumya K. Debray,et al.  Alias analysis of executable code , 1998, POPL '98.

[18]  Huibin Shi,et al.  Investigating available instruction level parallelism for stack based machine architectures , 2004 .

[19]  Masahiro Sowa,et al.  Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..

[20]  William A. Wulf,et al.  Evaluation of the WM Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[21]  Jr. Philip J. Koopman,et al.  Stack computers: the new wave , 1989 .

[22]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[23]  Bruno R. Preiss,et al.  Data flow on a queue machine , 1985, ISCA 1985.

[24]  Randal E. Bryant,et al.  Formal verification of an ARM processor , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[25]  Arquimedes Canedo,et al.  A GCC-based Compiler for the Queue Register Processor (QRP-GCC) , 2006 .

[26]  Sang Lyul Min,et al.  Code Generation for a Dual Instruction Set Processor Based on Selective Code Transformation , 2003, SCOPES.

[27]  Gary S. Tyson,et al.  Register queues: a new hardware/software approach to efficient software pipelining , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[28]  Arvind Krishnaswamy Microarchitecture and Compiler Techniques for Dual Width ISA Processors , 2006 .

[29]  Liam Goudge,et al.  Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[30]  Tsutomu Yoshinaga,et al.  Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.

[31]  Simon L. Peyton Jones,et al.  Imperative functional programming , 1993, POPL '93.

[32]  Lenwood S. Heath,et al.  Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..

[33]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[34]  Hyuk-Jae Lee,et al.  PARE: instruction set architecture for efficient code size reduction , 1999 .

[35]  Aviral Shrivastava,et al.  An efficient compiler technique for code size reduction using reduced bit-width ISAs , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.