论文信息 - Using Timetags for Program Dependency Enforcement

Using Timetags for Program Dependency Enforcement

Abstract We discuss how time tags can be used for the enforce-ment of program dependencies. Time tags can serveas the basic ordering enforcement mechanism whena large number of instructions are executing concur-rently. Proposed and future microarchitectures canhave hundreds or several hundreds of instructionsin ﬂight simultaneously. Using standard reservationtags, physical register addresses, and reorder buﬀers,performance does not scale well even for moderate-sized instruction windows. Time tags address muchof the complication of these units.In this paper we discuss the design, use and man-agement of time tags. We also provide simulationdataforanexamplemicroarchitecturethatillustratesthe advantages of using time tags for dependency en-forcement. IPC’s in the range of 3.8-5.1 are obtainedfor the range of machine conﬁgurations studied run-ning SpecInt-2000 and SpecInt-95 programs. 1 Introduction In this paper we discuss a means by which time tagscan be used to enforce proper program dependencieswhile allowing massive out-of-order execution by asuitable (and likely large) microarchitecture. Severalstudies into instruction level parallelism have shownthat there is parallelism within typical programs thatis not yet exploited by existing microarchitectures[8, 14, 27]. Suitable microarchitectures for extract-ing this ILP would need to execute a very large num-ber of instructions in parallel. However tracking andenforcingthedependenciesfortheselargemicroarchi-tectures presents signiﬁcant implementation diﬃcul-ties. The use of time tags appears to be a mechanismwell suited for this task.We are proposing to use time tags to maintain andenforce correct program order for all ﬂow dependen-cies whether they be registers, memory values, or in-struction control-ﬂow predicates. It may be notedthat the time aspect of time tags refers to program-order time and not wall clock time. The goal is toallow for massive speculative out-of-order instructionexecution while providing a means of tracking pro-gram order.In this paper we will describe the design and use oftime tags, and will present simulation results showinghow they allow for high ILP on a distributed microar-chitecture. The remainder of this paper is organizedas follows. Section 2 will provide some high level ap-plications for and background on the use of time tags.Speciﬁcally, microarchitectural capabilities that arelikely desirable in future machines, and which maybe implemented more easily using time tags, are pre-sented. Section 3 presents a description of time tagsand how they would ﬁt into a potential microarchi-tecture. Also discussed are some optional microar-chitectural features that can be facilitated once timetags are used as a program dependency enforcementmechanism. Section 4 brieﬂy presents a proposed mi-croarchitecture that uses time tags for all of its de-pendency tracking and enforcement. Simulated re-sults from this microarchitecture are also presented.We summarize in Section 5.

David Kaeli | Augustus K. Uht | Alireza Khalafi | David Morano

[1] P. Bannon,et al. Internal architecture of Alpha 21164 microprocessor , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[2] Antonio González,et al. Limits of Instruction Level Parallelism with Data Speculation , 1997 .

[3] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[4] Mikko H. Lipasti,et al. The Performance Potential of Value and Dependence Prediction , 1997, Euro-Par.

[5] Kevin Skadron,et al. A Scheme for Selective Squash and Re-issue for Single-Sided Branch Hammocks , 2001 .

[6] E. Smith,et al. Selective Dual Path Execution , 1996 .

[7] Augustus K. Uht,et al. Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[8] Gurindar S. Sohi,et al. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO.

[9] Chen-Yong Cher,et al. Skipper: a microarchitecture for exploiting control-flow independence , 2001, MICRO.

[10] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[11] John G. Cleary,et al. The architecture of an optimistic CPU: the WarpEngine , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[12] Dirk Grunwald,et al. Selective eager execution on the PolyPath architecture , 1998, ISCA.

[13] Gurindar S. Sohi,et al. Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors , 1992, MICRO 1992.

[14] James E. Smith,et al. The microarchitecture of superscalar processors , 1995, Proc. IEEE.

[15] Kevin Skadron,et al. HydraScalar: A Multipath-Capable Simulator , 2001 .

[16] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17] Mikko H. Lipasti,et al. Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[18] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[19] Thomas F. Wenisch,et al. IPC in the 10s via Resource Flow Computing with Levo , 2001 .

[20] David R. Kaeli,et al. Realizing High IPC Using Time-Tagged Resource-Flow Computing , 2002, Euro-Par.

[21] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22] J. E. Thornton,et al. Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[23] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).