RFPL: A Recovery Friendly Parity Logging Scheme for Reducing Small Write Penalty of SSD RAID

Parity based RAID suffers from poor small write performance due to heavy parity update overhead. The recently proposed method EPLOG constructs a new stripe with updated data chunks without updating old parity chunks. However, due to skewness of data accesses, old versions of updated data chunks often need to be kept to protect other data chunks of the same stripe. This seriously hurts the efficiency of recovering system from device failures due to the need of reconstructing the preserved old data chunks on failed devices. In this paper, we propose a Recovery Friendly Parity Logging scheme, called RFPL, which minimizes small write penalty and provides high recovery performance for SSD RAID. The key idea of RFPL is to reduce the mixture of old and new data chunks in a stripe by exploiting skewness of data accesses. RFPL constructs a new stripe with updated data chunks of the same old stripe. Since cold data chunks of the old stripe are rarely updated, it is likely that all of data chunks written to the new stripe are hot data and become old together within a short time span. This co-old of data chunks in a stripe effectively mitigates the total number of old data chunks which need to be preserved. We have implemented RFPL on a RAID-5 SSD array in Linux 4.3. Experimental results show that, compared with the Linux software RAID, RFPL reduces user I/O response time by 83.1% for normal state and 81.6% for reconstruction state. Compared with the state-of-the-art scheme EPLOG, RFPL reduces user I/O response time by 46.8% for normal state and 40.9% for reconstruction state. Our reliability analysis shows RFPL improves the mean time to data loss (MTTDL) by 9.36X and 1.44X compared with the Linux software RAID and EPLOG.

[1]  A. L. Narasimha Reddy,et al.  Does RAID Improve Lifetime of SSD Arrays? , 2016, ACM Trans. Storage.

[2]  Hong Jiang,et al.  Exploring and Exploiting the Multilevel Parallelism Inside SSDs for Improved Performance and Endurance , 2013, IEEE Transactions on Computers.

[3]  Daniel Stodolsky,et al.  Parity logging overcoming the small write problem in redundant disk arrays , 1993, ISCA '93.

[4]  Qing Yang,et al.  S2-RAID: A new RAID architecture for fast data recovery , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Robert F. Pawula,et al.  Generalizations and extensions of the Fokker- Planck-Kolmogorov equations , 1967, IEEE Trans. Inf. Theory.

[6]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[7]  Hong Jiang,et al.  JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[8]  Dongkun Shin,et al.  Flash-Aware RAID Techniques for Dependable and High-Performance Flash Memory SSD , 2011, IEEE Transactions on Computers.

[9]  Ahmed Amer,et al.  Using storage class memories to increase the reliability of two-dimensional RAID arrays , 2009, 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems.

[10]  Wei Wu,et al.  DEFT-Cache: A Cost-Effective and Highly Reliable SSD Cache for RAID Storage , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[12]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[13]  Sam H. Noh,et al.  Towards SLO Complying SSDs Through OPS Isolation , 2015, FAST.

[14]  Jongmoo Choi,et al.  Improving SSD reliability with RAID via Elastic Striping and Anywhere Parity , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[15]  Hong Jiang,et al.  Elastic-RAID: A New Architecture for Improved Availability of Parity-Based RAIDs by Elastic Mirroring , 2016, IEEE Transactions on Parallel and Distributed Systems.

[16]  Yongkun Li,et al.  Elastic Parity Logging for SSD RAID Arrays , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[18]  Zhipeng Li,et al.  Grouping-Based Elastic Striping with Hotness Awareness for Improving SSD RAID Performance , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[19]  Ching-Che Chung,et al.  Partial Parity Cache and Data Cache Management Method to Improve the Performance of an SSD-Based RAID , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[21]  Hong Jiang,et al.  WorkOut: I/O Workload Outsourcing for Boosting RAID Reconstruction Performance , 2009, FAST.

[22]  Hong Jiang,et al.  LDM: Log Disk Mirroring with Improved Performance and Reliability for SSD-Based Disk Arrays , 2016, TOS.

[23]  Hong Jiang,et al.  HPDA: A hybrid parity-based disk array for enhanced performance and reliability , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).