Delegated persist ordering

Systems featuring a load-store interface to persistent memory (PM) are expected soon, making in-memory persistent data structures feasible. Ensuring persistent data structure recoverability requires constraints on the order PM writes become persistent. But, current memory systems reorder writes, providing no such guarantees. To complement their upcoming 3D XPoint memory, Intel has announced new instructions to enable programmer control of data persistence. We describe the semantics implied by these instructions, an ordering model we call synchronous ordering. Synchronous ordering (SO) enforces order by stalling execution when PM write ordering is required, exposing PM write latency on the execution critical path. It incurs an average slowdown of 7.21x over volatile execution without ordering in PM-write-intensive benchmarks. SO tightly couples enforcing order and flushing writes to PM, but this tight coupling is unneeded in many recoverable software systems. Instead, we propose delegated ordering, wherein ordering requirements are communicated explicitly to the PM controller, fully decoupling PM write ordering from volatile execution and cache management. We demonstrate that delegated ordering can bring performance within 1.93x of volatile execution, improving over SO by 3.73x.

[1]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Ren-Shuo Liu,et al.  NVM duet: unified working memory and persistent store architecture , 2014, ASPLOS.

[3]  Thomas F. Wenisch,et al.  Persistency programming 101 , 2014 .

[4]  Orion Hodson,et al.  Whole-system persistence , 2012, ASPLOS XVII.

[5]  Hans-Juergen Boehm,et al.  Persistence programming models for non-volatile memory , 2016, ISMM.

[6]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[7]  Vijayalakshmi Srinivasan,et al.  Efficient scrub mechanisms for error-prone emerging memories , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[8]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Anoop Gupta,et al.  Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.

[10]  Ryan Johnson,et al.  Scalable Logging through Emerging Non-Volatile Memory , 2014, Proc. VLDB Endow..

[11]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[12]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[13]  David Seal,et al.  ARM Architecture Reference Manual , 2001 .

[14]  Onur Mutlu,et al.  FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016 .

[16]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[17]  Thomas F. Wenisch,et al.  InvisiFence: performance-transparent memory ordering in conventional multiprocessors , 2009, ISCA '09.

[18]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[20]  Luis A. Lastras,et al.  Practical and secure PCM systems by online detection of malicious write streams , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[21]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[22]  Yifeng Zhu,et al.  Accelerating write by exploiting PCM asymmetries , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[23]  Margaret Martonosi,et al.  ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[24]  Yale N. Patt,et al.  Soft updates: a solution to the metadata update problem in file systems , 2000 .

[25]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[26]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[27]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[28]  Karin Strauss,et al.  Preventing PCM banks from seizing too much power , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[29]  Thomas F. Wenisch,et al.  High-Performance Transactions for Persistent Memories , 2016, ASPLOS.

[30]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[31]  Dhruva R. Chakrabarti,et al.  Implications of CPU Caching on Byte-addressable Non-Volatile Memory Programming , 2012 .

[32]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[33]  James R. Larus,et al.  Transactional Memory , 2006, Transactional Memory.

[34]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[35]  Jade Alglave,et al.  Understanding POWER multiprocessors , 2011, PLDI '11.

[36]  Youyou Lu,et al.  Loose-Ordering Consistency for persistent memory , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[37]  Thomas F. Wenisch,et al.  Mechanisms for store-wait-free multiprocessors , 2007, ISCA '07.

[38]  Kevin Kai-Wei Chang,et al.  Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[40]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Sarita V. Adve,et al.  Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models , 1997, SPAA '97.

[42]  Stratis Viglas,et al.  Efficient persist barriers for multicores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  T. N. Vijaykumar,et al.  Is SC+ILP=RC? , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[44]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[45]  Terence Kelly,et al.  Procrastination Beats Prevention: Timely Sufficient Persistence for Efficient Crash Resilience , 2015, EDBT.

[46]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[47]  Jade Alglave,et al.  Herding cats: modelling, simulation, testing, and data-mining for weak memory , 2014, PLDI 2014.

[48]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.