On the importance of optimizing the configuration of stream prefetchers
暂无分享,去创建一个
[1] Yonghong Song,et al. Processor Aware Anticipatory Prefetching in Loops , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[2] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[3] Brad Calder,et al. Predictor-directed stream buffers , 2000, MICRO 33.
[4] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[5] Norman P. Jouppi,et al. How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors? , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[6] Trevor Mudge,et al. Improving data cache performance by pre-executing instructions under a cache miss , 1997 .
[7] Gary S. Tyson,et al. A prefetch taxonomy , 2004, IEEE Transactions on Computers.
[8] Janak H. Patel,et al. Stride directed prefetching in scalar processors , 1992, MICRO 1992.
[9] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[10] Stamatis Vassiliadis,et al. A load-instruction unit for pipelined processors , 1993, IBM J. Res. Dev..
[11] Todd M. Austin,et al. MASE: a novel infrastructure for detailed microarchitectural modeling , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..
[12] Sally A. McKee,et al. Hardware-only stream prefetching and dynamic access ordering , 2000, ICS '00.
[13] P. Chow,et al. Memory-system Design Considerations For Dynamically-scheduled Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[14] Santosh G. Abraham,et al. Effective stream-based and execution-based data prefetching , 2004, ICS '04.
[15] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[16] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[17] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[18] Michel Dubois,et al. Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..
[19] José F. Martínez,et al. Checkpointed early load retirement , 2005, 11th International Symposium on High-Performance Computer Architecture.