RnR: A Software-Assisted Record-and-Replay Hardware Prefetcher
暂无分享,去创建一个
John Shalf | Chao Zhang | Xiaochen Guo | Yuan Zeng | J. Shalf | Chao Zhang | Xiaochen Guo | Yuan Zeng
[1] Srinivas Devadas,et al. IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Ada Gavrilovska,et al. Balancing context switch penalty and response time with elastic time slicing , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[3] Rok Sosic,et al. SNAP , 2016, ACM Trans. Intell. Syst. Technol..
[4] Chen Ding,et al. Quantifying the cost of context switch , 2007, ExpCS '07.
[5] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[6] Calvin Lin,et al. Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[8] Hamid Sarbazi-Azad,et al. Bingo Spatial Data Prefetcher , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[9] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.
[10] Alvin AuYoung,et al. Presto: distributed machine learning and graph processing with sparse matrices , 2013, EuroSys '13.
[11] Sarita V. Adve,et al. Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.
[12] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.
[13] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[14] Paul D. Franzon,et al. FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).
[15] Pradeep Dubey,et al. Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[17] Cong Du,et al. MPI-Mitten: Enabling Migration Technology in MPI , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[18] Aamer Jaleel,et al. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[19] Hiroyuki Kitagawa,et al. GPU-Accelerated Graph Clustering via Parallel Label Propagation , 2017, CIKM.
[20] Dam Sunwoo,et al. Temporal Prefetching Without the Off-Chip Metadata , 2019, MICRO.
[21] Christina Freytag,et al. Using Mpi Portable Parallel Programming With The Message Passing Interface , 2016 .
[22] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[23] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[24] Yu He,et al. The YouTube video recommendation system , 2010, RecSys '10.
[25] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[26] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[27] Frederica Darema,et al. A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..
[28] Pierre Michaud. Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[29] Brad Calder,et al. Predictor-directed stream buffers , 2000, MICRO 33.
[30] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[31] James E. Smith,et al. Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.
[32] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.
[33] Li Zhao,et al. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[34] Hamid Sarbazi-Azad,et al. Domino Temporal Data Prefetcher , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[35] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.
[36] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Marco Rosa,et al. HyperANF: approximating the neighbourhood function of very large graphs on a budget , 2010, WWW.
[38] Hao Wu,et al. Efficient Metadata Management for Irregular Data Prefetching , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[39] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[40] Jack J. Dongarra,et al. High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems , 2016, Int. J. High Perform. Comput. Appl..
[41] Sam Ainsworth,et al. Software prefetching for indirect memory accesses , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[42] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[43] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[44] Daniel A. Jiménez,et al. Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[45] Martin Burtscher,et al. Bridging the processor-memory performance gap with 3D IC technology , 2005, IEEE Design & Test of Computers.
[46] Torsten Hoefler,et al. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations , 2017, HPDC.
[47] Heiner Litz,et al. Classifying Memory Access Patterns for Prefetching , 2020, ASPLOS.
[48] Christos Faloutsos,et al. Mining large graphs: Algorithms, inference, and discoveries , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[49] Sam Ainsworth,et al. An Event-Triggered Programmable Prefetcher for Irregular Workloads , 2018, ASPLOS.