DiskReduce: RAID for data-intensive scalable computing

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAID, organizations. DiskReduce is a modification of the Hadoop distributed file system (HDFS) enabling asynchronous compression of initially triplicated data down to RAID-class redundancy overheads. In addition to increasing a cluster's storage capacity as seen by its users by up to a factor of three, DiskReduce can delay encoding long enough to deliver the performance benefits of multiple data copies.

[1]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[2]  Private Communications , 2001 .

[3]  Elisa Bertino,et al.  PARALLEL AND DISTRIBUTED SYSTEMS , 2010 .

[4]  Jehoshua Bruck,et al.  Computing in the RAIN: A Reliable Array of Independent Nodes , 2000, IPDPS Workshops.

[5]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[6]  Thomas R. Gross,et al.  Combining the concepts of compression and caching for a two-level filesystem , 1991, ASPLOS IV.

[7]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[8]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Stefan Savage,et al.  AFRAID - A Frequently Redundant Array of Independent Disks , 1996, USENIX Annual Technical Conference.

[11]  Darrell D. E. Long,et al.  Swift/RAID: A Distributed RAID System , 1994, Comput. Syst..

[12]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Randy H. Katz,et al.  Patterson: "raid: high-performance, reliable secondary storage , 1994 .

[15]  James Lee Hafner,et al.  WEAVER codes: highly fault tolerant erasure codes for storage systems , 2005, FAST'05.

[16]  Randy H. Katz,et al.  Failure correction techniques for large disk arrays , 1989, ASPLOS III.

[17]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[18]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[19]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.