Quantitative Evaluation of Common Subexpression Elimination on Queue Machines

Queue computation model is a novel alternative for high performance architectures. Compiling for queue machines requires a different approach than compiling for traditional architectures. We have solved the problem of generating correct code with the queue compiler infrastructure. In this paper we introduce some problems encountered when optimizing code for queue machines. Common-subexpression elimination (CSE) is a widely used optimization to improve execution time. This paper makes a quantitative evaluation of how this optimization affects the characteristics of queue programs. We have found that in average, 28% of instructions are eliminated, and 15% of the critical path is reduced. We determine how enlarging the scope of compilation from expressions to basic blocks affects the distribution of offsetted instructions.

[1]  Masahiro Sowa,et al.  Design of a superscalar processor based on queue machine computation model , 1999, 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM 1999). Conference Proceedings (Cat. No.99CH36368).

[2]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[3]  Gürhan Küçük,et al.  Energy Efficient Register Renaming , 2003, PATMOS.

[4]  Masahiro Sowa,et al.  Design and architecture for an embedded 32-bit QueueCore , 2006, J. Embed. Comput..

[5]  Edward S. Davidson,et al.  Evaluating the Use of Register Queues in Software Pipelined Loops , 2001, IEEE Trans. Computers.

[6]  Scott A. Mahlke,et al.  The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs , 1991, ICPP.

[7]  Lenwood S. Heath,et al.  Stack and Queue Layouts of Directed Acyclic Graphs: Part I , 1999, SIAM J. Comput..

[8]  Tsutomu Yoshinaga,et al.  High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core , 2006, The Journal of Supercomputing.

[9]  Frans Henskens,et al.  A System for Robust Peer-to-Peer Communication with Dynamic Protocol Selection , 2007 .

[10]  William A. Wulf,et al.  Evaluation of the WM Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[11]  Arquimedes Canedo,et al.  Queue Register File Optimization Algorithm for QueueCore Processor , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[12]  Jason Merrill Generic and gimple: A new tree represen-tation for entire functions , 2003 .

[13]  Arquimedes Canedo,et al.  New Code Generation Algorithm for QueueCore An Embedded Processor with High ILP , 2007 .

[14]  Jozo J. Dujmovic,et al.  Evolution and evaluation of SPEC benchmarks , 1998, PERV.

[15]  Arquimedes Canedo,et al.  A new code generation algorithm for 2-offset producer order queue computation model , 2008, Comput. Lang. Syst. Struct..

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[17]  Bruno R. Preiss,et al.  Data flow on a queue machine , 1985, ISCA 1985.

[18]  Tsutomu Yoshinaga,et al.  Parallel Queue Processor Architecture Based on Produced Order Computation Model , 2005, The Journal of Supercomputing.

[19]  Lenwood S. Heath,et al.  Laying out Graphs Using Queues , 1992, SIAM J. Comput..

[20]  Herman Schmit,et al.  Queue machines: hardware compilation in hardware , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.