论文信息 - Slice-processors: an implementation of operation-based prediction

Slice-processors: an implementation of operation-based prediction

We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream. Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride). A slice processor dynamically identifies frequently missing loads and extracts on-the-fly the relevant address computation slices. Such slices are then executed in-parallel with the main sequential thread prefetching memory data. We describe the various support structures and emphasize the design of the slice detection mechanism. We demonstrate that a relatively simple organization can significantly improve performance over an aggressive, dynamically-scheduled processor and for a set of pointer-intensive programs and for some integer applications from the SPEC'95 suite. In particular, a slice processor that can detect slices of up to 8 instructions extracted over of a region of up to 32 instructions improves performance by 11% on the average (even if slice detection requires up to 32 cycles). Allowing slices of up to 16 instructions results in an average performance improvement of 15%. Finally, we study how our operation-based predictor interacts with an outcome-based one and find them mutually beneficial.

Dionisios N. Pnevmatikatos | Andreas Moshovos | Amirali Baniasadi

[1] C. Zilles,et al. Understanding the backward slices of performance degrading instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.

[3] Luddy Harrison. Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.

[4] M. Dubois,et al. Assisted Execution , 1998 .

[5] Christopher Hughes,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[6] Richard E. Kessler,et al. The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[7] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[8] K. Sundaramoorthy,et al. Slipstream processors: improving both performance and fault tolerance , 2000, ASPLOS IX.

[9] Norman P. Jouppi,et al. Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[10] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.

[11] Olivier Temam,et al. Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.

[13] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14] Jignesh M. Patel,et al. Data prefetching by dependence graph precomputation , 2001, ISCA 2001.

[15] John Paul Shen,et al. Instruction path coprocessors , 2000, ISCA '00.

[16] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[17] Andreas Moshovos,et al. Memory dependence prediction , 1998 .

[18] Andreas Moshovos,et al. Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.