Dataflow Predication

Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-order micro architectures. This paper describes dataflow predication, which provides per-instruction predication in a dataflow ISA, low predication computation overheads similar to VLIW ISAs, and low complexity out-of-order issue. A two-bitfield in each instruction specifies whether an instruction is predicated, in which case, an arriving predicate token determines whether an instruction should execute. Dataflow predication incorporates three features that reduce predication overheads. First, dataflow predicate computation permits computation of compound predicates with virtually no overhead instructions. Second, early mispredication termination squashes in-flight instructions with false predicates at any time, eliminating the overhead of falsely predicated paths. Finally, implicit predication mitigates the fanout overhead of dataflow predicates by reducing the number of explicitly predicated instructions, by predicating only the heads of dependence chains. Dataflow predication also exposes new compiler optimizations - such as disjoint instruction merging and path-sensitive predicate removal - for increased performance of predicated code in an out-of-order design

[1]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[2]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[3]  K. R. Traub,et al.  A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE , 1986 .

[4]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[5]  Roger A. Bringmann,et al.  Effective Compiler Support For Predicated Execution Using The Hyperblock , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[6]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1987, IEEE Trans. Computers.

[7]  Aaron Smith,et al.  Compiling for EDGE architectures , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[8]  Scott A. Mahlke,et al.  A framework for balancing control flow and predication , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[9]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.

[10]  S. Mahlke,et al.  The program decision logic approach to predicated execution , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[11]  Lizy Kurian John,et al.  Scaling to the end of silicon with EDGE architectures , 2004, Computer.

[12]  B. R. Rau,et al.  The Cydra 5 Departmental Supercomputer: design philosophies, decisions and trade-offs , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[13]  Microsystems Sun,et al.  Jini^ Architecture Specification Version 2.0 , 2003 .

[14]  Seth Copen Goldstein,et al.  Spatial computation , 2004, ASPLOS XI.

[15]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[16]  David I. August,et al.  Systematic compilation for predicated execution , 2000 .

[17]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[18]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[19]  Aaron Smith,et al.  Merging Head and Tail Duplication for Convergent Hyperblock Formation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  John Paul Shen,et al.  Register renaming and scheduling for dynamic execution of predicated code , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[21]  Brad Calder,et al.  Predicate prediction for efficient out-of-order execution , 2003, ICS '03.

[22]  Onur Mutlu,et al.  Wish branches: combining conditional branching and predication for adaptive predicated execution , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[23]  Wen-mei W. Hwu,et al.  The benefit of predicated execution for software pipelining , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[24]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[25]  Keshav Pingali,et al.  From Control Flow to Dataflow , 1991, J. Parallel Distributed Comput..

[26]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[27]  B. R. Rau,et al.  HPL-PD Architecture Specification:Version 1.1 , 2000 .

[28]  Xia Chen,et al.  A spatial path scheduling algorithm for EDGE architectures , 2006, ASPLOS XII.