Scalability of RAID systems

RAID systems (Redundant Arrays of Inexpensive Disks) have dominated backend storage systems for more than two decades and have grown continuously in size and complexity. Currently they face unprecedented challenges from data intensive applications such as image processing, transaction processing and data warehousing. As the size of RAID systems increases, designers are faced with both performance and reliability challenges. These challenges include limited back-end network bandwidth, physical interconnect failures, correlated disk failures and long disk reconstruction time. This thesis studies the scalability of RAID systems in terms of both performance and reliability through simulation, using a discrete event driven simulator for RAID systems (SIMRAID) developed as part of this project. SIMRAID incorporates two benchmark workload generators, based on the SPC-1 and Iometer benchmark specifications. Each component of SIMRAID is highly parameterised, enabling it to explore a large design space. To improve the simulation speed, SIMRAID develops a set of abstraction techniques to extract the behaviour of the interconnection protocol without losing accuracy. Finally, to meet the technology trend toward heterogeneous storage architectures, SIMRAID develops a framework that allows easy modelling of different types of device and interconnection technique. Simulation experiments were first carried out on performance aspects of scalability. They were designed to answer two questions: (1) given a number of disks, which factors affect back-end network bandwidth requirements; (2) given an interconnection network, how many disks can be connected to the system. The results show that the bandwidth requirement per disk is primarily determined by workload features and stripe unit size (a smaller stripe unit size has better scalability than a larger one), with cache size and RAID algorithm having very little effect on this value. The maximum number of disks is limited, as would be expected, by the back-end network bandwidth. Studies of reliability have led to three proposals to improve the reliability and scalability of RAID systems. Firstly, a novel data layout called PCDSDF is proposed. PCDSDF combines the advantages of orthogonal data layouts and parity declustering data layouts, so that it can not only survive multiple disk failures caused by physical interconnect failures or correlated disk failures, but also has a good degraded and rebuild performance. The generating process of PCDSDF is deterministic and time-efficient. The number of stripes per rotation (namely the number of stripes to achieve rebuild

[1]  Jehoshua Bruck,et al.  X-Code: MDS Array Codes with Optimal Encoding , 1999, IEEE Trans. Inf. Theory.

[2]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[3]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[4]  Randy H. Katz,et al.  Performance consequences of parity placement in disk arrays , 1991, ASPLOS IV.

[5]  Walter A. Burkhard,et al.  Permutation development data layout (PDDL) , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[6]  Roland N. Ibbett,et al.  Technical note: a hierarchical computer architecture design and simulation environment , 1998, TOMC.

[7]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[8]  Scott A. Brandt,et al.  Reliability mechanisms for very large storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[9]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[10]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[11]  Randy H. Katz,et al.  An analytic performance model of disk arrays , 1993, SIGMETRICS '93.

[12]  Gang Fu,et al.  A performance evaluation tool for RAID disk arrays , 2004, First International Conference on the Quantitative Evaluation of Systems, 2004. QEST 2004. Proceedings..

[13]  Yan Li,et al.  Novel Technique for Accelerated Simulation of Storage Systems , 2006, Parallel and Distributed Computing and Networks.

[14]  Donald F. Towsley,et al.  A Performance Evaluation of RAID Architectures , 1996, IEEE Trans. Computers.

[15]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[16]  Philip S. Yu,et al.  Analytic Modeling of Clustered RAID with Mapping Based on Nearly Random Permutation , 1996, IEEE Trans. Computers.

[17]  Feng Zhou,et al.  Simulation of fibre channel storage area network using SANSim , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[18]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[19]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[20]  Roland N. Ibbett,et al.  IOmeter performance comparison of SBOD and MBOD , 2004 .

[21]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[22]  Michael C. Fu,et al.  Guest editorial , 2003, TOMC.

[23]  Jehoshua Bruck,et al.  EVENODD: An Efficient Scheme for Tolerating Double Disk Failures in RAID Architectures , 1995, IEEE Trans. Computers.

[24]  Randy H. Katz,et al.  How reliable is a RAID? , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[25]  William H. Sanders,et al.  Performance analysis of the RAID 5 disk array , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[26]  Alexander Thomasian,et al.  Performance analysis of RAIDS disk arrays with a vacationing server model for rebuild mode operation , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[27]  Donald F. Towsley,et al.  The Design and Evaluation of RAID 5 and Parity Striping Disk Array Architectures , 1993, J. Parallel Distributed Comput..

[28]  Alexander Thomasian,et al.  A GRASP algorithm for the multi-objective knapsack problem , 2004 .

[29]  Roland N. Ibbett,et al.  Work in Progress: On the Scalability of Storage Sub-system Back-end Network , 2007 .

[30]  Micha Hofri Disk scheduling: FCFS vs.SSTF revisited , 1980, CACM.

[31]  Thomas Ruwart Performance characterization of large and long fibre channel arbitrated loops , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[32]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[33]  Yale N. Patt,et al.  Trading disk capacity for performance , 1993, [1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing.

[34]  Roland N. Ibbett,et al.  DSiMCluster: A Simulation Model for Efficient Memory Analysis Experiments of DSM Clusters , 2009, Simul..

[35]  Garth A. Gibson Redundant disk arrays: Reliable, parallel secondary storage. Ph.D. Thesis , 1990 .

[36]  Michelle Y. Kim,et al.  Synchronized Disk Interleaving , 1986, IEEE Transactions on Computers.

[37]  Myoung-Ho Kim,et al.  An analysis of the optimal number of servers in distributed client/server environments , 2004, Decis. Support Syst..

[38]  Jai Menon,et al.  The Architecture Of A Fault-tolerant Cached RAID Controller , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[39]  Chan-Ik Park Efficient Placement of Parity and Data to Tolerate Two Disk Failures in Disk Array Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[40]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[41]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[42]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[43]  Daniel P. Siewiorek,et al.  On-line Recovery in Redundant Disk Arrays , 1993 .

[44]  John Wilkes The Pantheon storage-system simulator , 1996 .

[45]  Feng-Bin Sun,et al.  A comprehensive review of hard-disk drive reliability , 1999, Annual Reliability and Maintainability. Symposium. 1999 Proceedings (Cat. No.99CH36283).

[46]  James S. Plank The RAID-6 Liberation Codes , 2008, FAST.

[47]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[48]  J. R. Heath,et al.  High speed storage area networks using a fibre channel arbitrated loop interconnect , 2000 .

[49]  Philip S. Yu,et al.  Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays , 1995, IEEE Trans. Computers.

[50]  Flaviu Cristian,et al.  Declustered disk array architectures with optimal and near-optimal parallelism , 1998, ISCA.

[51]  J. Sikora Disk failures in the real world : What does an MTTF of 1 , 000 , 000 hours mean to you ? , 2007 .

[52]  Randy H. Katz,et al.  An evaluation of redundant arrays of disks using an Amdahl 5890 , 1990, SIGMETRICS '90.

[53]  Eitan Bachmat,et al.  Analysis of methods for scheduling low priority disk drive tasks , 2002, SIGMETRICS '02.

[54]  T. Courtney,et al.  On the scalability of storage sub-system back-end networks , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[55]  Asser N. Tantawi,et al.  Asynchronous Disk Interleaving: Approximating Access Delays , 1991, IEEE Trans. Computers.

[56]  Daniel Stodolsky,et al.  Parity logging overcoming the small write problem in redundant disk arrays , 1993, ISCA '93.

[57]  Near-optimal Parallelism Declustered Disk Array Architectures with Optimal and , 1998 .

[58]  Arif Merchant,et al.  Issues and challenges in the performance analysis of real disk arrays , 2004, IEEE Transactions on Parallel and Distributed Systems.

[59]  Graeme R. Cole Estimating Drive Reliability in Desktop Computers and Consumer Electronics , 2003 .

[60]  Eric J. Schwabe,et al.  Improved parity-declustered layouts for disk arrays , 1994, SPAA '94.

[61]  Garth A. Gibson,et al.  Parity Logging Overcoming The Small Write Problem In Redundant Disk Arrays , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[62]  L. Richard Carley,et al.  MEMS-based integrated-circuit mass-storage systems , 2000, CACM.

[63]  Jim Zelenka,et al.  RAIDframe: rapid prototyping for disk arrays , 1996, SIGMETRICS '96.

[64]  Jai Menon Performance of RAID5 disk arrays with read and write caching , 2005, Distributed and Parallel Databases.

[65]  Roland N. Ibbett,et al.  Simulation of a computer architecture for quantum chromodynamics calculations , 2003, CROS.

[66]  Arif Merchant,et al.  A modular, analytical throughput model for modern disk arrays , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[67]  Jai Menon,et al.  Comparison of sparing alternatives for disk arrays , 1992, ISCA '92.

[68]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[69]  Maged M. Michael,et al.  Scale-up x Scale-out: A Case Study using Nutch/Lucene , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[70]  Wang Gang,et al.  Parity declustering data layout for tolerating dependent disk failures in network RAID systems , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[71]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[72]  James S. Plank A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .

[73]  Worawan Marurngsith,et al.  Simulation modelling of distributed-shared memory multiprocessors , 2006 .

[74]  Philip S. Yu,et al.  An Analytical Model of Reconstruction Time in Mirrored Disks , 1994, Perform. Evaluation.

[75]  Stefan Savage,et al.  AFRAID - A Frequently Redundant Array of Independent Disks , 1996, USENIX Annual Technical Conference.

[76]  Peter F. Corbett,et al.  Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.

[77]  Jai Menon,et al.  Floating Parity and Data Disk Arrays , 1993, J. Parallel Distributed Comput..

[78]  Alexander Thomasian,et al.  RAID5 Performance with Distributed Sparing , 1997, IEEE Trans. Parallel Distributed Syst..