Slice-processors: an implementation of operation-based prediction

We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operations, or the computation slice that can be used to calculate forthcoming memory references. This is in contrast to outcome-based predictors that exploit regularities in the (address) outcome stream. Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride). A slice processor dynamically identifies frequently missing loads and extracts on-the-fly the relevant address computation slices. Such slices are then executed in-parallel with the main sequential thread prefetching memory data. We describe the various support structures and emphasize the design of the slice detection mechanism. We demonstrate that a relatively simple organization can significantly improve performance over an aggressive, dynamically-scheduled processor and for a set of pointer-intensive programs and for some integer applications from the SPEC'95 suite. In particular, a slice processor that can detect slices of up to 8 instructions extracted over of a region of up to 32 instructions improves performance by 11% on the average (even if slice detection requires up to 32 cycles). Allowing slices of up to 16 instructions results in an average performance improvement of 15%. Finally, we study how our operation-based predictor interacts with an outcome-based one and find them mutually beneficial.

[1]  C. Zilles,et al.  Understanding the backward slices of performance degrading instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  James E. Smith,et al.  Decoupled access/execute computer architectures , 1984, TOCS.

[3]  Luddy Harrison Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.

[4]  M. Dubois,et al.  Assisted Execution , 1998 .

[5]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[6]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor architecture , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[7]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[8]  K. Sundaramoorthy,et al.  Slipstream processors: improving both performance and fault tolerance , 2000, ASPLOS IX.

[9]  Norman P. Jouppi,et al.  Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[10]  Yale N. Patt,et al.  Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.

[11]  Olivier Temam,et al.  Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[13]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, ISCA 2001.

[15]  John Paul Shen,et al.  Instruction path coprocessors , 2000, ISCA '00.

[16]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[17]  Andreas Moshovos,et al.  Memory dependence prediction , 1998 .

[18]  Andreas Moshovos,et al.  Improving virtual function call target prediction via dependence-based pre-computation , 1999, ICS '99.