A Case for Scoped Persist Barriers in GPUs

Two key trends in computing are evident --- emergence of GPU as a first-class compute element and emergence of byte-addressable nonvolatile memory technologies (NVRAM) as DRAM-supplement. GPUs and NVRAMs are likely to coexist in future systems. However, previous works have either focused on GPUs or on NVRAMs, in isolation. In this work, we investigate the enhancements necessary for a GPU to efficiently and correctly manipulate NVRAM-resident persistent data structures. Specifically, we find that previously proposed CPU-centric persist barriers fall short for GPUs. We thus introduce the concept of scoped persist barriers that aligns with the hierarchical programming framework of GPUs. Scoped persist barriers enable GPU programmers to express which execution group (a.k.a., scope) a given persist barrier applies to. We demonstrate that: 1 use of narrower scope than algorithmically-required can lead to inconsistency of persistent data structure, and 2 use of wider scope than necessary leads to significant performance loss (e.g., 25% or more). Therefore, a future GPU can benefit from persist barriers with different scopes.

[1]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[2]  Jongmoo Choi,et al.  ThyNVM: Enabling software-transparent crash consistency in persistent memory systems , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[4]  David A. Wood,et al.  QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Andrew Pavlo,et al.  Write-Behind Logging , 2016, Proc. VLDB Endow..

[6]  Thomas F. Wenisch,et al.  Memory persistency , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[7]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[8]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[9]  Dae-Hyun Kim,et al.  ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates , 2013, ISCA.

[10]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[11]  Terence Kelly,et al.  Failure-Atomic Persistent Memory Updates via JUSTDO Logging , 2016, ASPLOS.

[12]  Moinuddin K. Qureshi,et al.  Reducing read latency of phase change memory via early read and Turbo Read , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[13]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[14]  Michael Stonebraker,et al.  A Prolegomenon on OLTP Database Systems for Non-Volatile Memory , 2014, ADMS@VLDB.

[15]  Hao Wang,et al.  DUANG: Fast and lightweight page migration in asymmetric memory systems , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[16]  Youyou Lu,et al.  Loose-Ordering Consistency for persistent memory , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[17]  Onur Mutlu,et al.  FIRM: Fair and High-Performance Memory Control for Persistent Memory Systems , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Stratis Viglas,et al.  Efficient persist barriers for multicores , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Youyou Lu,et al.  DP2: reducing transaction overhead with differential and dual persistency in persistent memory , 2015, Conf. Computing Frontiers.

[20]  Tanmay Shah FabMem: A Multiported RAM and CAM Compiler for Superscalar Design Space Exploration. , 2010 .

[21]  Thomas F. Wenisch,et al.  Delegated persist ordering , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Wolfgang Lehner,et al.  SAP HANA: The Evolution from a Modern Main-Memory Data Platform to an Enterprise Application Platform , 2013, Proc. VLDB Endow..

[23]  Thomas F. Wenisch,et al.  High-Performance Transactions for Persistent Memories , 2016, ASPLOS.

[24]  Yan Solihin,et al.  Hiding the long latency of persist barriers using speculative execution , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[25]  Thomas F. Wenisch,et al.  Persistency programming 101 , 2014 .

[26]  Hideaki Kimura,et al.  FOEDUS: OLTP Engine for a Thousand Cores and NVRAM , 2015, SIGMOD Conference.

[27]  Hans-Juergen Boehm,et al.  Persistence programming models for non-volatile memory , 2016, ISMM.

[28]  Ren-Shuo Liu,et al.  NVM duet: unified working memory and persistent store architecture , 2014, ASPLOS.

[29]  Qi Wang,et al.  A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth , 2012, 2012 IEEE International Solid-State Circuits Conference.

[30]  Peter M. Chen,et al.  Free transactions with Rio Vista , 1997, SOSP.

[31]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[32]  Sungjoo Hong,et al.  Memory technology trend and future challenges , 2010, 2010 International Electron Devices Meeting.

[33]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[34]  Sudhanva Gurumurthi,et al.  Phase Change Memory: From Devices to Systems , 2011, Phase Change Memory.

[35]  David A. Wood,et al.  Heterogeneous-race-free memory models , 2014, ASPLOS.

[36]  Hans-Juergen Boehm,et al.  Atlas: leveraging locks for non-volatile memory consistency , 2014, OOPSLA.

[37]  Yuan Xie,et al.  Kiln: Closing the performance gap between systems with and without persistence support , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Peter J. Varman,et al.  Non-Intrusive Persistence with a Backend NVM Controller , 2016, IEEE Computer Architecture Letters.

[39]  David A. Wood,et al.  Synchronization Using Remote-Scope Promotion , 2015, ASPLOS.

[40]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[41]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[42]  Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, Salt Lake City, UT, USA, March 1-5, 2014 , 2014, ASPLOS.

[43]  Thomas F. Wenisch,et al.  Storage Management in the NVRAM Era , 2013, Proc. VLDB Endow..

[44]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.