Facilitating Magnetic Recording Technology Scaling for Data Center Hard Disk Drives through Filesystem-Level Transparent Local Erasure Coding

This paper presents a simple yet effective design solution to facilitate technology scaling for hard disk drives (HDDs) being deployed in data centers. Emerging magnetic recording technologies improve storage areal density mainly through reducing the track pitch, which however makes HDDs subject to higher read retry rates. More frequent HDD read retries could cause intolerable tail latency for large-scale systems such as data centers. To reduce the occurrence of costly read retry, one intuitive solution is to apply erasure coding locally on each HDD or JBOD (just a bunch of disks). To be practically viable, local erasure coding must have very low coding redundancy, which demands very long codeword length (e.g., one codeword spans hundreds of 4kB sectors) and hence large file size. This makes local erasure coding mainly suitable for data center applications. This paper contends that local erasure coding should be implemented transparently within filesystems, and accordingly presents a basic design framework and elaborates on important design issues. Meanwhile, this paper derives the mathematical formulations for estimating its effect on reducing HDD read tail latency. Using Reed-Solomon (RS) based erasure codes as test vehicles, we carried out detailed analysis and experiments to evaluate its implementation feasibility and effectiveness. We integrated the developed design solution into ext4 to further demonstrate its feasibility and quantitatively measure its impact on average speed performance of various big data benchmarks.

[1]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[2]  Mario Blaum,et al.  Partial-MDS Codes and Their Application to RAID Type of Architectures , 2012, IEEE Transactions on Information Theory.

[3]  Roger Wood,et al.  Analysis of Shingle-Write Readback Using Magnetic-Force Microscopy , 2010, IEEE Transactions on Magnetics.

[4]  Hanho Lee High-speed VLSI architecture for parallel Reed-Solomon decoder , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[6]  B. Vasic,et al.  2-D Magnetic Recording: Read Channel Modeling and Detection , 2009, IEEE Transactions on Magnetics.

[7]  Ganping Ju,et al.  A HAMR Media Technology Roadmap to an Areal Density of 4 Tb/in$^2$ , 2014, IEEE Transactions on Magnetics.

[8]  Stephen B. Wicker,et al.  Reed-Solomon Codes and Their Applications , 1999 .

[9]  Keqin Li,et al.  Systematic Data Placement Optimization in Multi-Cloud Storage for Complex Requirements , 2016, IEEE Transactions on Computers.

[10]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[11]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[12]  Marek Karpinski,et al.  An XOR-based erasure-resilient coding scheme , 1995 .

[13]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.

[14]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[15]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[16]  R. Galbraith,et al.  A Soft Decodable Concatenated LDPC Code , 2015, IEEE Transactions on Magnetics.

[17]  Eitan Yaakobi,et al.  Construction of Partial MDS and Sector-Disk Codes With Two Global Parity Symbols , 2016, IEEE Transactions on Information Theory.

[18]  H. Iwasaki,et al.  Future Options for HDD Storage , 2009, IEEE Transactions on Magnetics.

[19]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[20]  Xiaozhou Li,et al.  Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[21]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[22]  M. Darnell,et al.  Error Control Coding: Fundamentals and Applications , 1985 .

[23]  Rick Galbraith,et al.  2-D Magnetic Recording: Progress and Evolution , 2015, IEEE Transactions on Magnetics.

[24]  Mario Blaum,et al.  Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems , 2014, TOS.

[25]  H. Muraoka,et al.  Estimation of Maximum Track Density in Shingled Writing , 2009, IEEE Transactions on Magnetics.

[26]  Robert Cypher,et al.  Disks for Data Centers , 2016 .

[27]  Chubing Peng,et al.  Integrated Heat Assisted Magnetic Recording Head: Design and Recording Demonstration , 2008, IEEE Transactions on Magnetics.

[28]  John C. S. Lui,et al.  Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.

[29]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .