Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System

Cloud storage systems commonly use replication of stored data sets to ensure high reliability and availability. However, the high storage overhead of replication becomes increasingly unacceptable with the explosive growth of data stored in cloud. Some cloud storage systems have attempted to replace replication with erasure coding to reduce storage overhead, that is just the thinking behind Cloud RAID. A well-designed Cloud RAID mechanism should achieve the right tradeoffs between storage efficiency, performance, and reliability. As there exists no widely-accepted methods for Cloud RAID, we present a workloads-based Cloud RAID schema-Selective Cloud RAID (SCR for short). SCR treats primary storage and backup storage with different RAIDmethods, the former at the level of directories, and the latter at the level of individual files. SCR has three distinct advantages over previous attempts at Cloud RAID: (1) it can significantly reduce the storage overhead compared with threeway replication, (2) it can avoid most cases of the "small write bottleneck" and simplify system maintenance, (3) its implementation is modular, therefore, it is easy to configure different erasure codes for different workloads. Additionally, we have implemented a SCR prototype with RDP code, which shows significant benefits over Blaum-Roth codes in degraded read performance. To verify the effectiveness of SCR, we perform theoretical analysis and elaborate benchmark tests to evaluate the performance of SCR prototype.

[1]  Changsheng Xie,et al.  A Quantitative Evaluation Model for Choosing Efficient Redundancy Strategies over Clouds , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[2]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[3]  Andrea C. Arpaci-Dusseau,et al.  A file is not a file: understanding the I/O behavior of Apple desktop applications , 2011, SOSP 2011.

[4]  Garth A. Gibson,et al.  DiskReduce : Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing , 2011 .

[5]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[6]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[7]  Changsheng Xie,et al.  Avoiding performance fluctuation in cloud storage , 2010, 2010 International Conference on High Performance Computing.

[8]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[11]  Garth A. Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[12]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[13]  Changsheng Xie,et al.  Middleware enabled data sharing on cloud storage services , 2010, MW4SOC '10.

[14]  Fred Douglis,et al.  Characteristics of backup workloads in production systems , 2012, FAST.

[15]  Hai Jin,et al.  RAID-x: a new distributed disk array for I/O-centric cluster computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[16]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[17]  Darrell D. E. Long,et al.  Swift/RAID: A Distributed RAID System , 1994, Comput. Syst..

[18]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[19]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[20]  I. Reed,et al.  Polynomial Codes Over Certain Finite Fields , 1960 .

[21]  Changsheng Xie,et al.  Optimizing storage performance in public cloud platforms , 2011, Journal of Zhejiang University SCIENCE C.