Balancing I/O response time and disk rebuild time in a RAID5 disk array

When a disk in the RAID5 disk array architecture has failed, requests to that disk can only be serviced by reading data from all surviving disks and rebuilding the lost data. This may cut disk performance in half. To avoid this degradation, all of the lost data must be rebuilt and written to a spare disk. The faster the data are rebuilt, the sooner the disk array returns to normal operation. Giving high priority to the rebuild process, however, can increase response times for incoming application requests which complete for disk service. A balance must be found between acceptable application response times and disk rebuild times. Simulation was used to evaluate the effect of the rebuild unit size on response time and rebuild time. The authors have found this tradeoff to be embodied in the choice of the rebuild unit and the amount of rebuild data which is atomically read from each surviving disk. The find that a single track rebuild unit provides faster rebuild times than a one sector rebuild unit. Rebuilding one track at a time provides better application request response times when compared with rebuilding one cylinder at a time.<<ETX>>

[1]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[2]  Michelle Y. Kim,et al.  Synchronized Disk Interleaving , 1986, IEEE Transactions on Computers.

[3]  LivnyMiron,et al.  Multi-disk management algorithms , 1987 .

[4]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[5]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[6]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[7]  Spencer W. Ng,et al.  Trade-offs between devices and paths in achieving disk interleaving , 1988, ISCA '88.

[8]  Dina Bitton,et al.  Arm scheduling in shadowed disks , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[9]  Spencer Ng,et al.  Some design issues of disk arrays , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[10]  A. L. Narasimha Reddy,et al.  An Evaluation of Multiple-Disk I/O Systems , 1989, IEEE Trans. Computers.

[11]  Randy H. Katz,et al.  Disk system architectures for high performance computing , 1989, Proc. IEEE.

[12]  Randy H. Katz,et al.  Introduction to redundant arrays of inexpensive disks (RAID) , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[13]  David A. Patterson,et al.  Maximizing performance in a striped disk array , 1990, ISCA '90.

[14]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[15]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[16]  Randy H. Katz,et al.  An evaluation of redundant arrays of disks using an Amdahl 5890 , 1990, SIGMETRICS '90.

[17]  Asser N. Tantawi,et al.  Asynchronous Disk Interleaving: Approximating Access Delays , 1991, IEEE Trans. Computers.

[18]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[19]  J. Menon,et al.  Distributed sparing in disk arrays , 1992, Digest of Papers COMPCON Spring 1992.

[20]  Jai Menon,et al.  Performance of disk arrays in transaction processing environments , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[21]  Jai Menon,et al.  Comparison of sparing alternatives for disk arrays , 1992, ISCA '92.

[22]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.