TSOPER: Efficient Coherence-Based Strict Persistency

We propose a novel approach for hardware-based strict TSO persistency, called TSOPER. We allow a TSO persistency model to freely coalesce values in the caches, by forming atomic groups of cachelines to be persisted. A group persist is initiated for an atomic group if any of its newly written values are exposed to the outside world. A key difference with prior work is that our architecture is based on the concept of a TSO persist buffer, that sits in parallel to the shared LLC, and persists atomic groups directly from private caches to NVM, bypassing the coherence serialization of the LLC.To impose dependencies among atomic groups that are persisted from the private caches to the TSO persist buffer, we introduce a sharing-list coherence protocol that naturally captures the order of coherence operations in its sharing lists, and thus can reconstruct the dependencies among different atomic groups entirely at the private cache level without involving the shared LLC. The combination of the sharing-list coherence and the TSO persist buffer allows persist operations and writes to non-volatile memory to happen in the background and trail the coherence operations. Coherence runs ahead at full speed; persistency follows belatedly.Our evaluation shows that TSOPER provides the same level of reordering as a program-driven relaxed model, hence, approximately the same level of performance, albeit without needing the programmer or compiler to be concerned about false sharing, data-race-free semantics, etc., and guaranteeing all software that can run on top of TSO, automatically persists in TSO.

[1]  Satish Narayanasamy,et al.  Persistency for synchronization-free regions , 2018, PLDI.

[2]  Babak Falsafi,et al.  Distributed Logless Atomic Durability with Persistent Memory , 2019, MICRO.

[3]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[4]  Alberto Ros,et al.  Are distributed sharing codes a solution to the scalability problem of coherence directories in manycores? An evaluation study , 2016, The Journal of Supercomputing.

[5]  Jian Yang,et al.  Characterizing and Modeling Non-Volatile Memory Systems , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Stratis Viglas,et al.  ATOM: Atomic Durability in Non-volatile Memory through Hardware Logging , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Viktor Vafeiadis,et al.  Persistence semantics for weak memory: integrating epoch persistency with the TSO memory model , 2018, Proc. ACM Program. Lang..

[8]  Satish Narayanasamy,et al.  Language Support for Memory Persistency , 2019, IEEE Micro.

[9]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[10]  Hans-Juergen Boehm,et al.  Foundations of the C++ concurrency memory model , 2008, PLDI '08.

[11]  Shih-Hung Chen,et al.  Phase-change random access memory: A scalable technology , 2008, IBM J. Res. Dev..

[12]  Satish Narayanasamy,et al.  Language-level persistency , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[13]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[14]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[15]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[16]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[17]  Yan Solihin,et al.  Hiding the long latency of persist barriers using speculative execution , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[18]  Tudor David,et al.  Log-Free Concurrent Data Structures , 2018, USENIX Annual Technical Conference.

[19]  Ellis Giles,et al.  Atomic persistence for SCM with a non-intrusive backend controller , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Michael M. Swift,et al.  An Analysis of Persistent Memory Use with WHISPER , 2017, ASPLOS.

[21]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS III.

[22]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[23]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[24]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Satish Narayanasamy,et al.  Relaxed Persist Ordering Using Strand Persistency , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).

[26]  Hisashi Shima,et al.  Resistive Random Access Memory (ReRAM) Based on Metal Oxides , 2010, Proceedings of the IEEE.

[27]  Stefanos Kaxiras,et al.  Splash-3: A properly synchronized benchmark suite for contemporary research , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[28]  James R. Larus,et al.  Efficient logging in non-volatile memory by exploiting coherency protocols , 2017, Proc. ACM Program. Lang..

[29]  Steve Scargall,et al.  Programming Persistent Memory: A Comprehensive Guide for Developers , 2020 .

[30]  Andy Rudoff,et al.  Persistent Memory Programming , 2017, login Usenix Mag..

[31]  Andy Rudoff Programming Models for Emerging Non-Volatile Memory Technologies , 2013, login Usenix Mag..

[32]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[33]  Viktor Vafeiadis,et al.  Persistency semantics of the Intel-x86 architecture , 2019, Proc. ACM Program. Lang..

[34]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[35]  Seung Ryoul Maeng,et al.  Efficient Hardware-Assisted Logging with Asynchronous and Direct-Update for Persistent Memory , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[36]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[37]  Jongmoo Choi,et al.  ThyNVM: Enabling software-transparent crash consistency in persistent memory systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Jun Yang,et al.  Phase-Change Technology and the Future of Main Memory , 2010, IEEE Micro.

[39]  James R. Larus,et al.  Object-oriented recovery for non-volatile memory , 2018, Proc. ACM Program. Lang..

[40]  Bingsheng He,et al.  NV-Tree: A Consistent and Workload-Adaptive Tree Structure for Non-Volatile Memory , 2016, IEEE Transactions on Computers.

[41]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[42]  Stratis Viglas,et al.  Efficient persist barriers for multicores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).