Using Timetags for Program Dependency Enforcement

Abstract We discuss how time tags can be used for the enforce-ment of program dependencies. Time tags can serveas the basic ordering enforcement mechanism whena large number of instructions are executing concur-rently. Proposed and future microarchitectures canhave hundreds or several hundreds of instructionsin flight simultaneously. Using standard reservationtags, physical register addresses, and reorder buffers,performance does not scale well even for moderate-sized instruction windows. Time tags address muchof the complication of these units.In this paper we discuss the design, use and man-agement of time tags. We also provide simulationdataforanexamplemicroarchitecturethatillustratesthe advantages of using time tags for dependency en-forcement. IPC’s in the range of 3.8-5.1 are obtainedfor the range of machine configurations studied run-ning SpecInt-2000 and SpecInt-95 programs. 1 Introduction In this paper we discuss a means by which time tagscan be used to enforce proper program dependencieswhile allowing massive out-of-order execution by asuitable (and likely large) microarchitecture. Severalstudies into instruction level parallelism have shownthat there is parallelism within typical programs thatis not yet exploited by existing microarchitectures[8, 14, 27]. Suitable microarchitectures for extract-ing this ILP would need to execute a very large num-ber of instructions in parallel. However tracking andenforcingthedependenciesfortheselargemicroarchi-tectures presents significant implementation difficul-ties. The use of time tags appears to be a mechanismwell suited for this task.We are proposing to use time tags to maintain andenforce correct program order for all flow dependen-cies whether they be registers, memory values, or in-struction control-flow predicates. It may be notedthat the time aspect of time tags refers to program-order time and not wall clock time. The goal is toallow for massive speculative out-of-order instructionexecution while providing a means of tracking pro-gram order.In this paper we will describe the design and use oftime tags, and will present simulation results showinghow they allow for high ILP on a distributed microar-chitecture. The remainder of this paper is organizedas follows. Section 2 will provide some high level ap-plications for and background on the use of time tags.Specifically, microarchitectural capabilities that arelikely desirable in future machines, and which maybe implemented more easily using time tags, are pre-sented. Section 3 presents a description of time tagsand how they would fit into a potential microarchi-tecture. Also discussed are some optional microar-chitectural features that can be facilitated once timetags are used as a program dependency enforcementmechanism. Section 4 briefly presents a proposed mi-croarchitecture that uses time tags for all of its de-pendency tracking and enforcement. Simulated re-sults from this microarchitecture are also presented.We summarize in Section 5.

[1]  P. Bannon,et al.  Internal architecture of Alpha 21164 microprocessor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[2]  Antonio González,et al.  Limits of Instruction Level Parallelism with Data Speculation , 1997 .

[3]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[4]  Mikko H. Lipasti,et al.  The Performance Potential of Value and Dependence Prediction , 1997, Euro-Par.

[5]  Kevin Skadron,et al.  A Scheme for Selective Squash and Re-issue for Single-Sided Branch Hammocks , 2001 .

[6]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[7]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[8]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO.

[9]  Chen-Yong Cher,et al.  Skipper: a microarchitecture for exploiting control-flow independence , 2001, MICRO.

[10]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[11]  John G. Cleary,et al.  The architecture of an optimistic CPU: the WarpEngine , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[12]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[13]  Gurindar S. Sohi,et al.  Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.

[14]  James E. Smith,et al.  The microarchitecture of superscalar processors , 1995, Proc. IEEE.

[15]  Kevin Skadron,et al.  HydraScalar: A Multipath-Capable Simulator , 2001 .

[16]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17]  Mikko H. Lipasti,et al.  Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[18]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[19]  Thomas F. Wenisch,et al.  IPC in the 10s via Resource Flow Computing with Levo , 2001 .

[20]  David R. Kaeli,et al.  Realizing High IPC Using Time-Tagged Resource-Flow Computing , 2002, Euro-Par.

[21]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[23]  Brad Calder,et al.  Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).