论文信息 - High-performance throughput computing

High-performance throughput computing

CMT processors offer a way to significantly improve the performance of computer systems. The return on investment for multithreading is among the highest in computer microarchitectural techniques. If you design a core from scratch to support multithreading, gains as high as 3/spl times/ are possible for just a 20 percent increase in area. Even with throughput performance as the main target, we have shown that the microarchitecture necessary to support threads on a CMT can also achieve high single-thread performance. Hardware scouting, which Sun is implementing on the Rock microprocessor, can increase the single-thread performance of applications by up to 40 percent. Alternatively, scouting is a technique that makes the on-chip caches appear much larger, performance robustness technique, making up for code tailored for different on-chip cache sizes or even a different number and levels of caches.

[1] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[2] Josep Llosa,et al. Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[3] Rajeev Balasubramonian,et al. Dynamically allocating processor resources between nearby and distant ILP , 2001, ISCA 2001.

[4] Trevor N. Mudge,et al. Author retrospective improving data cache performance by pre-executing instructions under a cache miss , 1997, International Conference on Supercomputing.

[5] Marc Tremblay,et al. The MAJC Architecture: A Synthesis of Parallelism and Scalability , 2000, IEEE Micro.

[6] John Paul Shen,et al. Dynamic speculative precomputation , 2001, MICRO.

[7] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[8] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[9] Mikko H. Lipasti,et al. A performance methodology for commercial servers , 2000, IBM J. Res. Dev..

[10] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[11] Trevor Mudge,et al. Thread-level parallelism and interactive performance of desktop applications , 2000, SIGP.

[12] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.

[13] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[14] Balaram Sinharoy,et al. Design and implementation of the POWER5 microprocessor , 2004, Proceedings. 41st Design Automation Conference, 2004..

[15] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..