Informed Prefetching for Indirect Memory Accesses
暂无分享,去创建一个
Resit Sendag | Joshua J. Yi | Mustafa Cavus | Resit Sendag | J. Yi | R. Sendag | M. Cavus
[1] Sam Ainsworth,et al. An Event-Triggered Programmable Prefetcher for Irregular Workloads , 2018, ASPLOS.
[2] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[3] Norishige Chiba,et al. Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..
[4] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[5] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[6] Thomas F. Wenisch,et al. Temporal streaming of shared memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[7] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[8] Donald Yeung,et al. Design and evaluation of compiler algorithms for pre-execution , 2002, ASPLOS X.
[9] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[10] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[11] David J. Lilja,et al. Data prefetch mechanisms , 2000, CSUR.
[12] Craig Zilles,et al. Execution-based prediction using speculative slices , 2001, ISCA 2001.
[13] Rakesh Krishnaiyer,et al. Value-Profile Guided Stride Prefetching for Irregular Code , 2002, CC.
[14] Thomas F. Wenisch,et al. Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[15] Mikko H. Lipasti,et al. Partial resolution in branch target buffers , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.
[16] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[17] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[18] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[19] Srinivas Devadas,et al. IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[20] Brad Calder,et al. Pointer cache assisted prefetching , 2002, MICRO.
[21] Donald Yeung,et al. Multicore Performance Optimization Using Partner Cores , 2011, HotPar.
[22] Gary S. Tyson,et al. A prefetch taxonomy , 2004, IEEE Transactions on Computers.
[23] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[24] Omer Khan,et al. CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.
[25] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[26] Yuan Chou,et al. Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[27] Martin Burtscher,et al. Efficient emulation of hardware prefetchers via event-driven helper threading , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[29] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[30] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[31] Vijayalakshmi Srinivasan,et al. Exploring the limits of prefetching , 2005, IBM J. Res. Dev..
[32] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[33] Onur Mutlu,et al. Continuous runahead: Transparent hardware acceleration for memory intensive workloads , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[34] Margaret Martonosi,et al. DeSC: Decoupled supply-compute communication management for heterogeneous architectures , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[35] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[36] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[37] Sam Ainsworth,et al. Software prefetching for indirect memory accesses , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[38] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[39] Thomas F. Wenisch,et al. Mechanisms for store-wait-free multiprocessors , 2007, ISCA '07.
[40] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[41] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[42] John Paul Shen,et al. Dynamic speculative precomputation , 2001, MICRO.
[43] A. Azzouz. 2011 , 2020, City.
[44] Thomas F. Wenisch,et al. Temporal streams in commercial server applications , 2008, 2008 IEEE International Symposium on Workload Characterization.
[45] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[46] Chia-Lin Yang,et al. Push vs. pull: data movement for linked data structures , 2000, ICS '00.
[47] Thomas F. Wenisch,et al. Spatio-temporal memory streaming , 2009, ISCA '09.
[48] Thomas F. Wenisch,et al. Temporal instruction fetch streaming , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[49] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[50] Gustavo Alonso,et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[51] Gurindar S. Sohi,et al. Effective jump-pointer prefetching for linked data structures , 1999, ISCA.