论文信息 - Tolerating Cache-Miss Latency with Multipass Pipelines

Tolerating Cache-Miss Latency with Multipass Pipelines

Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism

Wen-mei W. Hwu | Ronald D. Barnes | Shane Ryoo | R. D. Barnes | Shane Ryoo

[1] Larry L. Biro,et al. Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[2] Onur Mutlu,et al. On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor , 2005, IEEE Computer Architecture Letters.

[3] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[4] E. You,et al. A third-generation SPARC V9 64-b microprocessor , 2000, IEEE Journal of Solid-State Circuits.

[5] Wen-mei W. Hwu,et al. "Flea-flicker" multipass pipelining: an alternative to the high-power out-of-order offense , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[6] Wen-mei W. Hwu,et al. Field-testing IMPACT EPIC research results in Itanium 2 , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7] Trevor Mudge,et al. Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[8] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.