Live Recovery of Bit Corruptions in Datacenter Storage Systems

Due to its high performance and decreasing cost per bit, flash is becoming the main storage medium in datacenters for hot data. However, flash endurance is a perpetual problem, and due to technology trends, subsequent generations of flash devices exhibit progressively shorter lifetimes before they experience uncorrectable bit errors. In this paper we propose extending flash lifetime by allowing devices to expose higher bit error rates. To do so, we present DIRECT, a novel set of policies that leverages latent redundancy in distributed storage systems to recover from bit corruption errors with minimal performance and recovery overhead. In doing so, DIRECT can significantly extend the lifetime of flash devices by effectively utilizing these devices even after they begin exposing bit errors. We implemented DIRECT on two real-world storage systems: ZippyDB, a distributed key-value store backed by RocksDB, and HDFS, a distributed file system. When tested on production traces at Facebook, DIRECT reduces application-visible error rates in ZippyDB by more than 10^2 and recovery time by more than 10^4. DIRECT also allows HDFS to tolerate a 10^4--10^5 higher bit error rate without experiencing application-visible errors.

[1]  Qiang Wu,et al.  A Large-Scale Study of Flash Memory Failures in the Field , 2015, SIGMETRICS 2015.

[2]  Andrea L. Lacaita,et al.  Reliability of NAND Flash Memories: Planar Cells and Emerging Issues in 3D Devices , 2017, Comput..

[3]  Andrea C. Arpaci-Dusseau,et al.  Redundancy Does Not Imply Fault Tolerance: Analysis of Distributed Storage Reactions to Single Errors and Corruptions , 2017, FAST.

[4]  Rino Micheloni,et al.  3D Flash Memories , 2016, Springer Netherlands.

[5]  GhemawatSanjay,et al.  The Google file system , 2003 .

[6]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[7]  Nanning Zheng,et al.  LDPC-in-SSD: making advanced error correction codes work effectively in solid state drives , 2013, FAST.

[8]  Gérard D. Cohen,et al.  Error-correcting WOM-codes , 1991, IEEE Trans. Inf. Theory.

[9]  Sungjin Lee,et al.  FlexFS: A Flexible Flash File System for MLC NAND Flash Memory , 2009, USENIX Annual Technical Conference.

[10]  Ren-Shuo Liu,et al.  DuraCache: A durable SSD cache using MLC NAND flash , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[12]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[13]  Cheng Li,et al.  Nitro: A Capacity-Optimized SSD Cache for Primary Storage , 2014, USENIX Annual Technical Conference.

[14]  Su-Jin Ahn,et al.  Evolution of NAND Flash Memory: From 2D to 3D as a Storage Market Leader , 2017, 2017 IEEE International Memory Workshop (IMW).

[15]  Wei Wu,et al.  Optimizing NAND flash-based SSDs via retention relaxation , 2012, FAST.

[16]  Shuhei Tanakamaru,et al.  Post-manufacturing, 17-times acceptable raw bit error rate enhancement, dynamic codeword transition ECC scheme for highly reliable solid-state drives, SSDs , 2010, 2010 IEEE International Memory Workshop.

[17]  Seok-Hee Lee,et al.  Technology scaling challenges and opportunities of memory devices , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[18]  Junhee Lim,et al.  A new ruler on the storage market: 3D-NAND flash for high-density memory and its technology evolutions and challenges on the future , 2016, 2016 IEEE International Electron Devices Meeting (IEDM).

[19]  Robert Cypher,et al.  Disks for Data Centers , 2016 .

[20]  Arif Merchant,et al.  Flash Reliability in Production: The Expected and the Unexpected , 2016, FAST.

[21]  Andrea C. Arpaci-Dusseau,et al.  Protocol-Aware Recovery for Consensus-Based Storage , 2018, FAST.

[22]  Yanpei Chen,et al.  The Truth About MapReduce Performance on SSDs , 2014, LISA.

[23]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[24]  Sungjin Lee,et al.  Lifetime improvement of NAND flash-based storage systems using dynamic program and erase scaling , 2014, FAST.

[25]  Arif Merchant,et al.  Janus: Optimal Flash Provisioning for Cloud Storage Workloads , 2013, USENIX Annual Technical Conference.

[26]  Andrea C. Arpaci-Dusseau,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[27]  Ke Zhou,et al.  FlexECC: Partially Relaxing ECC of MLC SSD for Better Cache Performance , 2014, USENIX Annual Technical Conference.

[28]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[29]  Cheng Li,et al.  Pannier: A Container-based Flash Cache for Compound Objects , 2015, Middleware.

[30]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[31]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[32]  Kai Li,et al.  RIPQ: Advanced Photo Caching on Flash for Facebook , 2015, FAST.

[33]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[34]  Hongzhong Zheng,et al.  Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling , 2014 .

[35]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[36]  Andrea C. Arpaci-Dusseau,et al.  Improving file system reliability with I/O shepherding , 2007, SOSP.

[37]  Steven Swanson,et al.  The bleak future of NAND flash memory , 2012, FAST.

[38]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[39]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[40]  Sachin Katti,et al.  Reducing DRAM footprint with NVM in Facebook , 2018, EuroSys.

[41]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[42]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).