Code Partitioning in Decoupled Compilers

Decoupled access/execute architectures seek to maximize performance by dividing a given program into two separate instruction streams and executing the streams on independent cooperating processors. The instruction streams consist of those instructions involved in generating memory accesses (the Access stream) and those that consume the data (the Execute stream). If the processor running the access stream is able to get ahead of the execute stream, then dynamic pre-loading of operands will occur and the penalty due to long latency operations (such as memory accesses) will be reduced or eliminated. Although these architectures have been around for many years, the performance analyses performed have been incomplete for want of a compiler. Very little has been published on how to construct a compiler for such an architecture. In this paper we describe the partitioning method employed in Daecomp, a compiler for decoupled access/execute processors.

[1]  Matthew Farrens,et al.  Compiler techniques for evaluating and extending decoupled architectures (data prefetching) , 2000 .

[2]  William H. Press,et al.  Numerical recipes in C , 2002 .

[3]  G. Adams,et al.  Performance modeling and code partitioning for the DS architecture , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[4]  Paul T. Hulina,et al.  A decoupled access/execute architecture for efficient access of structured data , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[5]  Honesty Cheng Young,et al.  Evaluation of a decoupled computer architecture and the design of a vector extension (pipelined processor; delayed branch, code scheduling, software pipelining, queue register) , 1985 .

[6]  Nigel P. Topham,et al.  A comparison of data prefetching on an access decoupled and superscalar machine , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Gary S. Tyson,et al.  Evaluation of a Scalable Decoupled Microprocessor Design , 1997 .

[8]  Jian-Tu Hsieh Performance evaluation of the pipe computer architecture , 1986 .

[9]  Alasdair Rawsthorne,et al.  Compiling and optimizing for decoupled architectures , 1995 .

[10]  Nigel P. Topham,et al.  Performance of the decoupled ACRI-1 architecture: the perfect club , 1995, HPCN Europe.

[11]  James E. Smith,et al.  Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[12]  E.S. Davidson,et al.  The effects of memory latency and fine-grain parallelism on Astronautics ZS-1 performance , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.