论文信息 - Learning-based Memory Allocation for C++ Server Workloads

Learning-based Memory Allocation for C++ Server Workloads

Modern C++ servers have memory footprints that vary widely over time, causing persistent heap fragmentation of up to 2x from long-lived objects allocated during peak memory usage. This fragmentation is exacerbated by the use of huge (2MB) pages, a requirement for high performance on large heap sizes. Reducing fragmentation automatically is challenging because C++ memory managers cannot move objects. This paper presents a new approach to huge page fragmentation. It combines modern machine learning techniques with a novel memory manager (LLAMA) that manages the heap based on object lifetimes and huge pages (divided into blocks and lines). A neural network-based language model predicts lifetime classes using symbolized calling contexts. The model learns context-sensitive per-allocation site lifetimes from previous runs, generalizes over different binary versions, and extrapolates from samples to unobserved calling contexts. Instead of size classes, LLAMA's heap is organized by lifetime classes that are dynamically adjusted based on observed behavior at a block granularity. LLAMA reduces memory fragmentation by up to 78% while only using huge pages on several production servers. We address ML-specific questions such as tolerating mispredictions and amortizing expensive predictions across application execution. Although our results focus on memory allocation, the questions we identify apply to other system-level problems with strict latency and resource requirements where machine learning could be applied.

[1] Craig Chambers,et al. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[2] Lu Fang,et al. FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications , 2015, ASPLOS.

[3] Christoforos E. Kozyrakis,et al. Towards energy proportionality for large-scale latency-critical workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[4] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[5] Scott A. Mahlke,et al. Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..

[6] Craig Chambers,et al. FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[7] Jin-Soo Kim,et al. Controlling physical memory fragmentation in mobile systems , 2015, ISMM.

[8] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.

[9] Gu-Yeon Wei,et al. Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[10] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11] Kathryn S. McKinley,et al. Age-based garbage collection , 1999, OOPSLA '99.

[12] Gu-Yeon Wei,et al. Mallacc: Accelerating Memory Allocation , 2017, ASPLOS.

[13] Kathryn S. McKinley,et al. Pretenuring for Java , 2001, OOPSLA '01.

[14] Kathryn S. McKinley,et al. Dynamic object sampling for pretenuring , 2004, ISMM '04.

[15] Xi Yang,et al. Taking off the gloves with reference counting Immix , 2013, OOPSLA.

[16] Andrew McGregor,et al. Mesh: compacting memory management for C/C++ applications , 2019, PLDI.

[17] Duarte Patrício,et al. Runtime Object Lifetime Profiler for Latency Sensitive Big Data Applications , 2019, EuroSys.

[18] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[19] David Detlefs,et al. Garbage-first garbage collection , 2004, ISMM '04.

[20] Christopher Olston,et al. TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[21] David A. Cohn,et al. Predicting Lifetimes in Dynamically Allocated Memory , 1996, NIPS.