Continuous runahead: Transparent hardware acceleration for memory intensive workloads
暂无分享,去创建一个
[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[2] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[3] Trevor Mudge,et al. Improving data cache performance by pre-executing instructions under a cache miss , 1997 .
[4] Dirk Grunwald,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[5] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[6] Simultaneous subordinate microthreading (SSMT) , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[7] Dean M. Tullsen,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[8] Eric Rotenberg,et al. Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.
[9] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[10] Jignesh M. Patel,et al. Data prefetching by dependence graph precomputation , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[11] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[12] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.
[13] John Paul Shen,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[14] John Paul Shen,et al. Dynamic speculative precomputation , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[15] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[16] John Paul Shen,et al. Speculative Precomputation on Chip Multiprocessors , 2002 .
[17] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[18] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[19] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[20] Donald Yeung,et al. Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.
[21] Onur Mutlu,et al. Runahead Execution: An Effective Alternative to Large Instruction Windows , 2003, IEEE Micro.
[22] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[23] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[24] Haitham Akkary,et al. Continual flow pipelines , 2004, ASPLOS XI.
[25] Onur Mutlu,et al. On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor , 2005, IEEE Computer Architecture Letters.
[26] Onur Mutlu,et al. Address-value delta (AVD) prediction: increasing the effectiveness of runahead execution by exploiting regular memory allocation patterns , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[27] Wei-Chung Hsu,et al. Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[28] Onur Mutlu,et al. Techniques for efficient processing in runahead execution engines , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[29] Huiyang Zhou,et al. Dual-core execution: building a highly scalable single-thread instruction window , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[30] Sanjay J. Patel,et al. Beating in-order stalls with "flea-flicker" two-pass pipelining , 2006, IEEE Transactions on Computers.
[31] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[32] Onur Mutlu,et al. Efficient runahead execution processors , 2006 .
[33] Onur Mutlu,et al. Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance , 2006, IEEE Micro.
[34] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[35] Weifeng Zhang,et al. Accelerating and Adapting Precomputation Threads for Effcient Prefetching , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[36] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[37] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[38] Stijn Eyerman,et al. System-Level Performance Metrics for Multiprogram Workloads , 2008, IEEE Micro.
[39] Michael C. Huang,et al. A performance-correctness explicitly-decoupled architecture , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[40] Mateo Valero,et al. Runahead Threads to improve SMT performance , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[41] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[42] Stijn Eyerman,et al. MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor , 2008, HiPEAC.
[43] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[45] Mateo Valero,et al. Efficient Runahead Threads , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[46] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[47] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[48] Yale N. Patt,et al. Filtered runahead execution with a runahead buffer , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[49] Onur Mutlu,et al. Accelerating Dependent Cache Misses with an Enhanced Memory Controller , 2016, ISCA.
[50] Milad Hashemi,et al. On-Chip Mechanisms to Reduce Effective Memory Access Latency , 2016, ArXiv.