Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications
暂无分享,去创建一个
Tanvir Ahmed Khan | Gilles Pokam | Joseph Devietti | Heiner Litz | Baris Kasikci | Akshitha Sriraman | Gilles A. Pokam | Dexin Zhang | Joseph Devietti | Heiner Litz | Baris Kasikci | Akshitha Sriraman | Dexin Zhang
[1] Efraim Rotem,et al. Inside 6th-Generation Intel Core: New Microarchitecture Code-Named Skylake , 2017, IEEE Micro.
[2] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[3] Kei Hiraki,et al. Inter-reference gap distribution replacement: an improved replacement algorithm for set-associative caches , 2004, ICS '04.
[4] Harish Patil,et al. Ispike: a post-link optimizer for the Intel/spl reg/ Itanium/spl reg/ architecture , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[5] C. Wilkerson,et al. A Dueling Segmented LRU Replacement Algorithm with Adaptive Bypassing , 2010 .
[6] Hamid Sarbazi-Azad,et al. MANA: Microarchitecting an Instruction Prefetcher , 2021, ArXiv.
[7] Thomas F. Wenisch,et al. A Primer on Hardware Prefetching , 2014, A Primer on Hardware Prefetching.
[8] Zhe Wang,et al. Perceptron learning for reuse prediction , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Thomas F. Wenisch,et al. SoftSKU: Optimizing Server Architectures for Microservice Diversity @Scale , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[10] Sang Lyul Min,et al. On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies , 1999, SIGMETRICS '99.
[11] Jaehyuk Huh,et al. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[12] Tipp Moseley,et al. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Reena Panda,et al. B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors , 2012, IEEE Computer Architecture Letters.
[14] Tanvir Ahmed Khan,et al. DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling , 2021, OSDI.
[15] Ben Niu,et al. Reverse Debugging of Kernel Failures in Deployed Systems , 2020, USENIX Annual Technical Conference.
[16] Hamid Sarbazi-Azad,et al. Divide and Conquer Frontend Bottleneck , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[17] Daniel A. Jiménez. Insertion and promotion for tree-based PseudoLRU last-level caches , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Yannis Smaragdakis,et al. Adaptive Caches: Effective Shaping of Cache Behavior to Workloads , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[19] Adam M. Izraelevitz,et al. The Rocket Chip Generator , 2016 .
[20] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[21] Dharmendra S. Modha,et al. CAR: Clock with Adaptive Replacement , 2004, FAST.
[22] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[23] Calvin Lin,et al. Applying Deep Learning to the Cache Replacement Problem , 2019, MICRO.
[24] Daniel A. Jiménez,et al. Evolution of the Samsung Exynos CPU Microarchitecture , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[25] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[26] Boris Grot,et al. Leeway: Addressing Variability in Dead-Block Prediction for Last-Level Caches , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[27] Kevin Swersky,et al. An Imitation Learning Approach for Cache Replacement , 2020, ICML.
[28] Yannis Smaragdakis,et al. EELRU: simple and effective adaptive page replacement , 1999, SIGMETRICS '99.
[29] Christoforos E. Kozyrakis,et al. Memory Hierarchy for Web Search , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[30] André Seznec,et al. The FNL+MMA Instruction Cache Prefetcher , 2020 .
[31] George Candea,et al. Failure sketching: a technique for automated root cause diagnosis of in-production failures , 2015, SOSP.
[32] Guilherme Ottoni,et al. Optimizing function placement for large-scale data-center applications , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[33] David Xinliang Li,et al. Lightweight feedback-directed cross-module optimization , 2010, CGO '10.
[34] Thomas F. Wenisch,et al. Temporal instruction fetch streaming , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[35] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[36] Mikko H. Lipasti. Cache Replacement Policies , 2016 .
[37] Daniel A. Jiménez,et al. Multiperspective Reuse Prediction , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Pierre Michaud,et al. PIPS: Prefetching Instructions with Probabilistic Scouts , 2020 .
[39] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[40] Kei Hiraki,et al. Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.
[41] Daniel A. Jiménez,et al. The Temporal Ancestry Prefetcher , 2020 .
[42] Akanksha Jain,et al. Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[43] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[44] George Candea,et al. Failure Sketches: A Better Way to Debug , 2015, HotOS.
[45] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.
[46] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[47] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[48] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[49] Thomas F. Wenisch,et al. RDIP: Return-address-stack Directed Instruction Prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[50] Andrea Rosà,et al. Renaissance: benchmarking suite for parallel applications on the JVM , 2019, PLDI.
[51] Todd C. Mowry,et al. Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[52] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.
[53] Cheng-Chieh Huang,et al. Boomerang: A Metadata-Free Architecture for Control Flow Delivery , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[54] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[55] Mahesh Subramony,et al. The AMD “Zen 2” Processor , 2020, IEEE Micro.
[56] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[57] Neelu Shivprakash Kalani,et al. Run-Jump-Run: Bouquet of Instruction Pointer Jumpers for High Performance Instruction Prefetching , 2020 .
[58] Babak Falsafi,et al. SHIFT: Shared history instruction fetch for lean-core server processors , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[59] Jinchun Kim,et al. Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy , 2017, ASPLOS.
[60] Alan Jay Smith,et al. Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.
[61] Boris Grot,et al. Blasting through the Front-End Bottleneck with Shotgun , 2018, ASPLOS.
[62] Peter J. Denning,et al. Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.
[63] Babak Falsafi,et al. Proactive instruction fetch , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[64] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[65] Tanvir Ahmed Khan,et al. I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[66] Michael F. P. O'Boyle,et al. IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.
[67] Margaret Martonosi,et al. Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, ISCA.
[68] Alberto Ros,et al. The Entangling Instruction Prefetcher , 2020, IEEE Computer Architecture Letters.
[69] Babak Falsafi,et al. Confluence: Unified instruction supply for scale-out servers , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[70] Christoforos E. Kozyrakis,et al. AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[71] H. Irie,et al. D-JOLT: Distant Jolt Prefetcher , 2020 .
[72] Ben Niu,et al. Lazy Diagnosis of In-Production Concurrency Bugs , 2017, SOSP.
[73] Guilherme Ottoni,et al. BOLT: A Practical Binary Optimizer for Data Centers and Beyond , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[74] Christoforos E. Kozyrakis,et al. ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.
[75] Gerhard Weikum,et al. The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.
[76] Dam Sunwoo,et al. Rebasing Instruction Prefetching: An Industry Perspective , 2020, IEEE Computer Architecture Letters.
[77] Samira Manabi Khan,et al. Sampling Dead Block Prediction for Last-Level Caches , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[78] Heiner Litz,et al. Classifying Memory Access Patterns for Prefetching , 2020, ASPLOS.
[79] Texas,et al. BARÇA: Branch Agnostic Region Searching Algorithm , 2020 .
[80] Daniel A. Jiménez,et al. Exploring Predictive Replacement Policies for Instruction Cache and Branch Target Buffer , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[81] Yan Solihin,et al. Counter-based cache replacement algorithms , 2005, 2005 International Conference on Computer Design.
[82] Amer Diwan,et al. The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.
[83] Calvin Lin,et al. Rethinking Belady's Algorithm to Accommodate Prefetching , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[84] Ben Niu,et al. REPT: Reverse Debugging of Failures in Deployed Software , 2018, OSDI.
[85] Jean-Loup Baer,et al. Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[86] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[87] Pramod Bhatotia,et al. Execution reconstruction: harnessing failure reoccurrences for failure reproduction , 2021, PLDI.
[88] Guilherme Ottoni,et al. Lightning BOLT: powerful, fast, and scalable binary optimization , 2021, CC.
[89] Glenn Reinman,et al. Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[90] Kei Hiraki,et al. Unified memory optimizing architecture: memory subsystem control with a unified predictor , 2012, ICS '12.
[91] J. Spencer Love,et al. Caching strategies to improve disk system performance , 1994, Computer.
[92] Jinson Koppanalil,et al. The Arm Neoverse N1 Platform: Building Blocks for the Next-Gen Cloud-to-Edge Infrastructure SoC , 2020, IEEE Micro.
[93] Onur Mutlu,et al. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).