Tolerating Cache-Miss Latency with Multipass Pipelines

Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism

[1]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[2]  Onur Mutlu,et al.  On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor , 2005, IEEE Computer Architecture Letters.

[3]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[4]  E. You,et al.  A third-generation SPARC V9 64-b microprocessor , 2000, IEEE Journal of Solid-State Circuits.

[5]  Wen-mei W. Hwu,et al.  "Flea-flicker" multipass pipelining: an alternative to the high-power out-of-order offense , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[6]  Wen-mei W. Hwu,et al.  Field-testing IMPACT EPIC research results in Itanium 2 , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Trevor Mudge,et al.  Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[8]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Cameron McNairy,et al.  Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.