论文信息 - MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime - 字舞流文

MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime

Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern data cen-ters, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simpliﬁed memory access patterns. We implemented MemLiner in two widely-used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely-deployed cloud systems shows MemLiner improves applications’ end-to-end performance by up to 2.5 × .

Yifan Qiao | Shan Lu | Jon Eyolfson | G. Xu | Chenxi Wang | Haoran Ma | Shicheng Liu | Christian Navasca

[1] Yifan Qiao,et al. Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory , 2022, NSDI.

[2] Yifan Qiao,et al. Mako: a low-pause, high-throughput evacuating collector for memory-disaggregated datacenters , 2022, PLDI.

[3] Yutong Huang,et al. Clio: a hardware-software co-designed disaggregated memory system , 2021, ASPLOS.

[4] John N. Zigman,et al. Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid Memories , 2022, ACM Trans. Comput. Syst..

[5] Onur Mutlu,et al. Rethinking software runtimes for disaggregated memory , 2021, ASPLOS.

[6] Marcos K. Aguilera,et al. Can far memory improve job throughput? , 2020, EuroSys.

[7] Mor Harchol-Balter,et al. Borg: the next generation , 2020, EuroSys.

[8] David Sidler,et al. StRoM: smart remote memory , 2020, EuroSys.

[9] Mark Silberstein,et al. Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers , 2020, ASPLOS.

[10] Mosharaf Chowdhury,et al. Effectively Prefetching Remote Memory with Leap , 2019, USENIX ATC.

[11] Siddhartha Sen,et al. Disaggregation and the Application , 2019, HotCloud.

[12] Marcos K. Aguilera,et al. AIFM: High-Performance, Application-Integrated Far Memory , 2020, OSDI.

[13] Binyu Zang,et al. Platinum: A CPU-Efficient Concurrent Garbage Collector for Tail-Reduction of Interactive Services , 2020, USENIX Annual Technical Conference.

[14] Miryung Kim,et al. Semeru: A Memory-Disaggregated Managed Runtime , 2020, OSDI.

[15] Joshua Fried,et al. Caladan: Mitigating Interference at Microsecond Timescales , 2020, OSDI.

[16] Miryung Kim,et al. Gerenuk: thin computation over big native data using speculative program transformation , 2019, SOSP.

[17] Onur Mutlu,et al. Panthera: holistic memory management for big data processing over hybrid memories , 2019, PLDI.

[18] Marcos K. Aguilera,et al. Designing Far Memory Data Structures: Think Outside the Box , 2019, HotOS.

[19] Jichuan Chang,et al. Software-Defined Far Memory in Warehouse-Scale Computers , 2019, ASPLOS.

[20] Hakim Weatherspoon,et al. Shoal: A Network Architecture for Disaggregated Racks , 2019, NSDI.

[21] Yiying Zhang,et al. LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[22] Haibo Chen,et al. Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory , 2017, ASPLOS.

[23] Kejiang Ye,et al. Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[24] Amanda Carbonari,et al. Tolerating Faults in Disaggregated Datacenters , 2017, HotNets.

[25] Marcos K. Aguilera,et al. Remote memory in the age of fast networks , 2017, SoCC.

[26] Kang G. Shin,et al. Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[27] Scott Shenker,et al. Network Requirements for Resource Disaggregation , 2016, OSDI.

[28] Lu Fang,et al. Yak: A High-Performance Big-Data-Friendly Garbage Collector , 2016, OSDI.

[29] Andrew Dinn,et al. Shenandoah: An open-source concurrent compacting garbage collector for OpenJDK , 2016, PPPJ.

[30] Sparsh Mittal,et al. A Survey of Recent Prefetching Techniques for Processor Caches , 2016, ACM Comput. Surv..

[31] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32] John Kubiatowicz,et al. Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications , 2016, ASPLOS.

[33] Lu Fang,et al. Interruptible tasks: treating memory pressure as interrupts for highly scalable data-parallel programs , 2015, SOSP.

[34] Ashish Gupta,et al. The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[35] Kimberly Keeton,et al. The Machine: An Architecture for Memory-centric Computing , 2015, ROSS@HPDC.

[36] Kiyoung Choi,et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[37] Lu Fang,et al. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications , 2015, ASPLOS.

[38] Nhan Nguyen,et al. NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines , 2015, ASPLOS.

[39] Yang Liu,et al. Willow: A User-Programmable SSD , 2014, OSDI.

[40] Michael Kaminsky,et al. Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[41] Miguel Castro,et al. FaRM: Fast Remote Memory , 2014, NSDI.

[42] Hyeontaek Lim,et al. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[43] Krste Asanovic,et al. FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[44] Scott Shenker,et al. Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[45] Engin Ipek,et al. PARDIS: A programmable memory controller for the DDRx interfacing standards , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[46] Thomas F. Wenisch,et al. System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[47] Richard E. Jones,et al. The Garbage Collection Handbook: The art of automatic memory management , 2011, Chapman and Hall / CRC Applied Algorithms and Data Structures Series.

[48] L. Barroso. Warehouse-Scale Computing: Entering the Teenage Decade , 2011, SIGARCH Comput. Archit. News.

[49] Michael Wolf,et al. C4: the continuously concurrent compacting collector , 2011, ISMM '11.

[50] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[51] Edith Schonberg,et al. Finding low-utility data structures , 2010, PLDI '10.

[52] Thomas F. Wenisch,et al. Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[53] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[54] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[55] Erez Petrank,et al. The Compressor: concurrent, incremental, and parallel compaction , 2006, PLDI '06.

[56] Michael Wolf,et al. The pauseless GC algorithm , 2005, VEE '05.

[57] David Detlefs,et al. Garbage-first garbage collection , 2004, ISMM '04.

[58] Taiichi Yuasa,et al. Real-time garbage collection on general-purpose machines , 1990, J. Syst. Softw..