∅sim: Preparing System Software for a World with Terabyte-scale Memories

Recent advances in memory technologies mean that commodity machines may soon have terabytes of memory; however, such machines remain expensive and uncommon today. Hence, few programmers and researchers can debug and prototype fixes for scalability problems or explore new system behavior caused by terabyte-scale memories. To enable rapid, early prototyping and exploration of system software for such machines, we built and open-sourced the ∅sim simulator. ∅sim uses virtualization to simulate the execution of huge workloads on modest machines. Our key observation is that many workloads follow the same control flow regardless of their input. We call such workloads data-oblivious. 0sim harnesses data-obliviousness to make huge simulations feasible and fast via memory compression. ∅sim is accurate enough for many tasks and can simulate a guest system 20-30x larger than the host with 8x-100x slowdown for the workloads we observed, with more compressible workloads running faster. For example, we simulate a 1TB machine on a 31GB machine, and a 4TB machine on a 160GB machine. We give case studies to demonstrate the utility of ∅sim. For example, we find that for mixed workloads, the Linux kernel can create irreparable fragmentation despite dozens of GBs of free memory, and we use ∅sim to debug unexpected failures of memcached with huge memories.

[1]  Ana Sokolova,et al.  Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures , 2015, OOPSLA.

[2]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[3]  Leland L. Beck,et al.  A dynamic storage allocation technique based on memory residence time , 1982, CACM.

[4]  Muli Ben-Yehuda,et al.  The Turtles Project: Design and Implementation of Nested Virtualization , 2010, OSDI.

[5]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[6]  Andrea C. Arpaci-Dusseau,et al.  Emulating goliath storage systems with David , 2012, TOS.

[7]  M. Frans Kaashoek,et al.  RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.

[8]  Milo M. K. Martin,et al.  Simulating a $ 2 M Commercial Server on a $ 2 K PC T , 2001 .

[9]  Kenneth C. Knowlton,et al.  A fast storage allocator , 1965, CACM.

[10]  D. Julian M. Davies Memory occupancy patterns in garbage collection systems , 1984, CACM.

[11]  Michael M. Swift,et al.  Efficient virtual memory for big memory servers , 2013, ISCA.

[12]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[13]  Hubertus Franke,et al.  Memory Expansion Technology (MXT): Software support and performance , 2001, IBM J. Res. Dev..

[14]  James L. Peterson,et al.  Buddy systems , 1977, CACM.

[15]  K. Gopinath,et al.  Prudent Memory Reclamation in Procrastination-Based Synchronization , 2016, ASPLOS.

[16]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[17]  K. Gopinath,et al.  A Case for Protecting Huge Pages from the Kernel , 2016, APSys.

[18]  Aditya Chopra,et al.  FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[19]  Daniel S. Hirschberg,et al.  A class of dynamic memory allocation algorithms , 1973, CACM.

[20]  Gordon Bell,et al.  Revisiting Scalable Coherent Shared Memory , 2018, Computer.

[21]  John E. Shore On the external storage fragmentation produced by first-fit and best-fit allocation strategies , 1975, CACM.

[22]  K. Gopinath,et al.  Making Huge Pages Actually Useful , 2018, ASPLOS.

[23]  Yang Wang,et al.  Exalt: Empowering Researchers to Evaluate Large-Scale Storage Systems , 2014, NSDI.

[24]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[25]  Frederick P. Brooks,et al.  Architecture of the IBM System/360 , 1964, IBM J. Res. Dev..

[26]  Erez Petrank,et al.  Space overhead bounds for dynamic memory management with partial compaction , 2011, POPL '11.

[27]  Paul R. Wilson,et al.  The memory fragmentation problem: solved? , 1998, ISMM '98.

[28]  Norman R. Nielsen Dynamic memory allocation in computer simulation , 1977, CACM.

[29]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[30]  Ted G. Lewis,et al.  Dynamic memory allocation systems for minimizing internal fragmentation , 1974, ACM '74.

[31]  Srilatha Manne,et al.  Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.

[32]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[33]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[34]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[35]  Osman S. Unsal,et al.  Range Translations for Fast Virtual Memory , 2016, IEEE Micro.

[36]  Amin Vahdat,et al.  To infinity and beyond: time warped network emulation , 2005, SOSP '05.

[37]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[38]  Andy Whitcroft,et al.  The What, The Why and the Where To of Anti-Fragmentation , 2010 .

[39]  Charles Burr Weinstock Dynamic storage allocation techniques. , 1976 .

[40]  Michael Hamburg,et al.  Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.

[41]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[42]  Jun Li,et al.  Quartz: A Lightweight Performance Emulator for Persistent Memory Software , 2015, Middleware.

[43]  Eric Eide,et al.  Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[44]  Erol Gelenbe Minimizing wasted space in partitioned segmentation , 1973, CACM.

[45]  Jaehyuk Huh,et al.  Hybrid TLB coalescing: Improving TLB translation coverage under diverse fragmented memory allocations , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).