Maintaining good performance in disk arrays during failure via uniform parity group distribution

Disk arrays are increasingly being used in distributed computing systems, as the vehicle for providing reliable and high performance data storage. When a disk in a RAID-5 fails, data in that failed disk can still be made available through parity reconstruction by reading from the other disks. However, this poses an increased burden on the surviving disks, and if consideration is not given to this failure consequence, then the performance of the system may degrade to an unacceptable level. This paper describes techniques that will enable the disk array to maintain good performance in the event of a disk failure. After a failed disk has been repaired, its content must be reconstructed from all the associated parity groups. In RAID-5, this must be a single thread sequential process. With the techniques introduced in this paper, it is shown how this sequential process can now be broken down into multiple parallel processes distributed throughout the array, thus shortening the reconstruction time. While the techniques introduced in this paper are applied to disk arrays, they may potentially have general applications in other areas of distributed computing.<<ETX>>

[1]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[2]  R. Mead,et al.  The Design of Experiments , 1989 .

[3]  Spencer Ng,et al.  Some design issues of disk arrays , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[4]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[5]  R. Mead,et al.  The Design of Experiments. , 1989 .

[6]  Prithviraj Banerjee,et al.  Gracefully degradable disk arrays , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[7]  J. Menon,et al.  Distributed sparing in disk arrays , 1992, Digest of Papers COMPCON Spring 1992.