Persistent State Machines for Recoverable In-memory Storage Systems with NVRam

Distributed in-memory storage systems are crucial for meeting the low latency requirements of modern datacenter services. However, they lose all state on failure, so recovery is expensive and data loss is always a risk. Persistent memory (PM) offers the possibility of building fast, persistent in-memory storage; however, existing PM systems are built from scratch or require heavy modification of existing systems. To rectify these problems, this paper presents Persimmon, a PM-based system that converts existing distributed in-memory storage systems into persistent, crash-consistent versions with low overhead and minimal code changes.

[1]  Haibo Chen,et al.  Espresso: Brewing Java For More Non-Volatility with Non-volatile Memory , 2017, ASPLOS.

[2]  Steven Swanson,et al.  An Empirical Guide to the Behavior and Use of Scalable Persistent Memory , 2019, FAST.

[3]  Mohit Verma,et al.  go-pmem: Native Support for Programming Persistent Memory in Go , 2020, USENIX Annual Technical Conference.

[4]  Sam H. Noh,et al.  Write-Optimized Dynamic Hashing for Persistent Memory , 2019, FAST.

[5]  Taesoo Kim,et al.  SplitFS: reducing software overhead in file systems for persistent memory , 2019, SOSP.

[6]  Steve Best JFS Log: How the Journaled File System Performs Logging , 2000, Annual Linux Showcase & Conference.

[7]  Philip A. Bernstein,et al.  Recovery Algorithms for Database Systems , 1983, IFIP Congress.

[8]  Patrick Th. Eugster,et al.  NVthreads: Practical Persistence for Multi-threaded Applications , 2017, EuroSys.

[9]  Pandian Raju,et al.  Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing , 2018, OSDI.

[10]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[11]  James R. Larus,et al.  Efficient logging in non-volatile memory by exploiting coherency protocols , 2017, Proc. ACM Program. Lang..

[12]  Zhenwei Wu,et al.  PMThreads: persistent memory threads harnessing versioned shadow copies , 2020, PLDI.

[13]  Qin Zhao,et al.  Transparent dynamic instrumentation , 2012, VEE '12.

[14]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[15]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[16]  Jian Xu,et al.  NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System , 2017, SOSP.

[17]  James R. Larus,et al.  Object-oriented recovery for non-volatile memory , 2018, Proc. ACM Program. Lang..

[18]  Margo I. Seltzer,et al.  Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory , 2017, HotStorage.

[19]  Per-Åke Larson,et al.  Easy Lock-Free Indexing in Non-Volatile Memory , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[20]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[21]  Youjip Won,et al.  Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree , 2018, FAST.

[22]  Steven Swanson,et al.  Pronto: Easy and Fast Persistence for Volatile Data Structures , 2020, ASPLOS.

[23]  Qingrui Liu,et al.  Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2019 .

[24]  Jialin Li,et al.  Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering , 2016, OSDI.

[25]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[26]  Guy E. Blelloch,et al.  NVTraverse: in NVRAM data structures, the destination is more important than the journey , 2020, PLDI.

[27]  Jian Xu,et al.  Finding and Fixing Performance Pathologies in Persistent Memory Software Stacks , 2019, ASPLOS.

[28]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[29]  Jin Xiong,et al.  HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[30]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[31]  Taesoo Kim,et al.  Recipe: converting concurrent DRAM indexes to persistent-memory indexes , 2019, SOSP.

[32]  Li Zhang,et al.  NVMcached: An NVM-based Key-Value Cache , 2016, APSys.

[33]  David G. Andersen,et al.  There is more consensus in Egalitarian parliaments , 2013, SOSP.

[34]  Steven Swanson,et al.  Pangolin: A Fault-Tolerant Persistent Memory Programming Library , 2019, USENIX Annual Technical Conference.

[35]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[36]  Cheng Wang,et al.  Supporting Legacy Binary Code in a Software Transaction Compiler with Dynamic Binary Translation and Optimization , 2008, CC.

[37]  Jie Wu,et al.  Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory , 2018, OSDI.

[38]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[39]  Michael D. Bond,et al.  Crafty: efficient, HTM-compatible persistent transactions , 2020, PLDI.

[40]  Erez Petrank,et al.  Efficient lock-free durable sets , 2019, Proc. ACM Program. Lang..

[41]  Michael Coughlan Direct Access Files , 2014 .

[42]  Vijay Kumar,et al.  Recovery mechanisms in database systems , 1997 .

[43]  Qing Wang,et al.  FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory , 2020, ASPLOS.

[44]  Samira Khan,et al.  Cross-Failure Bug Detection in Persistent Memory Programs , 2020, ASPLOS.

[45]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[46]  Pascal Felber,et al.  Romulus: Efficient Algorithms for Persistent Transactional Memory , 2018, SPAA.

[47]  Per-Åke Larson,et al.  BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory , 2018, Proc. VLDB Endow..

[48]  Steven Swanson,et al.  Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks , 2019, FAST.

[49]  Viktor Vafeiadis,et al.  Persistency semantics of the Intel-x86 architecture , 2019, Proc. ACM Program. Lang..

[50]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[51]  Sam H. Noh,et al.  SLM-DB: Single-Level Key-Value Store with Persistent Memory , 2019, FAST.

[52]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[53]  James R. Larus,et al.  Fine-Grain Checkpointing with In-Cache-Line Logging , 2019, ASPLOS.

[54]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[55]  Haibo Chen,et al.  Performance and protection in the ZoFS user-space NVM file system , 2019, SOSP.

[56]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[57]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[58]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[59]  ChenShimin,et al.  Persistent B+-trees in non-volatile main memory , 2015, VLDB 2015.

[60]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[61]  Viktor Leis,et al.  Persistent Memory I/O Primitives , 2019, DaMoN.

[62]  Jishen Zhao,et al.  PMTest: A Fast and Flexible Testing Framework for Persistent Memory Programs , 2019, ASPLOS.

[63]  Jiguang Wan,et al.  MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM , 2020, USENIX Annual Technical Conference.

[64]  Yuan Xie,et al.  Making B+-tree efficient in PCM-based main memory , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[65]  Jiwu Shu,et al.  Log-Structured Non-Volatile Main Memory , 2017, USENIX Annual Technical Conference.

[66]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[67]  Jing Liu,et al.  I'm Not Dead Yet!: The Role of the Operating System in a Kernel-Bypass Era , 2019, HotOS.

[68]  Margo I. Seltzer,et al.  Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs , 2018, USENIX ATC.

[69]  Suman Nath,et al.  Rethinking Database Algorithms for Phase Change Memory , 2011, CIDR.

[70]  A. L. Narasimha Reddy,et al.  SCMFS: A File System for Storage Class Memory and its Extensions , 2013, ACM Trans. Storage.

[71]  Amar Phanishayee,et al.  Atomic In-place Updates for Non-volatile Main Memories with Kamino-Tx , 2017, EuroSys.

[72]  Michael L. Scott,et al.  iDO: Compiler-Directed Failure Atomicity for Nonvolatile Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[73]  Hasso Plattner,et al.  NVC-Hashmap: A Persistent and Concurrent Hashmap For Non-Volatile Memories , 2015, IMDM@VLDB.

[74]  Xi Wang,et al.  Specifying and Checking File System Crash-Consistency Models , 2016, ASPLOS.

[75]  Michael M. Swift,et al.  An Analysis of Persistent Memory Use with WHISPER , 2017, ASPLOS.

[76]  Bo Ding,et al.  Lock-free Concurrent Level Hashing for Persistent Memory , 2020, USENIX Annual Technical Conference.

[77]  Andrea C. Arpaci-Dusseau,et al.  Redesigning LSMs for Nonvolatile Memory with NoveLSM , 2018, USENIX Annual Technical Conference.

[78]  Hari Balakrishnan,et al.  Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.

[79]  Naveen Kr. Sharma,et al.  Recovering Shared Objects Without Stable Storage , 2017, DISC.

[80]  Arvind Krishnamurthy,et al.  Building consistent transactions with inconsistent replication , 2015, SOSP.

[81]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[82]  Haibo Chen,et al.  Soft Updates Made Simple and Fast on Non-volatile Memory , 2017, USENIX Annual Technical Conference.

[83]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[84]  Satish Narayanasamy,et al.  Persistency for synchronization-free regions , 2018, PLDI.

[85]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[86]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[87]  Irving L. Traiger,et al.  The Recovery Manager of the System R Database Manager , 1981, CSUR.

[88]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[89]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[90]  Ismail Oukid,et al.  FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory , 2016, SIGMOD Conference.

[91]  Josep Torrellas,et al.  AutoPersist: an easy-to-use Java NVM framework based on reachability , 2019, PLDI.

[92]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[93]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[94]  Michael M. Swift,et al.  MOD: Minimally Ordered Durable Datastructures for Persistent Memory , 2019, ASPLOS.

[95]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[96]  Sam H. Noh,et al.  WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems , 2017, FAST.

[97]  Yu Hua,et al.  A Write-Friendly and Cache-Optimized Hashing Scheme for Non-Volatile Memory Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[98]  Yang Wang,et al.  All about Eve: Execute-Verify Replication for Multi-Core Servers , 2012, OSDI.

[99]  Andrea C. Arpaci-Dusseau,et al.  Designing a True Direct-Access File System with DevFS , 2018, FAST.

[100]  Hwansoo Han,et al.  Libnvmmio: Reconstructing Software IO Path with Failure-Atomic Memory-Mapped Interface , 2020, USENIX Annual Technical Conference.