Automatic Recovery from Disk Failure in Continuous-Media Servers

Continuous-media (CM) servers have been around for some years. Apart from server capacity, another important issue in the deployment of CM servers is reliability. This study investigates rebuild algorithms for automatically rebuilding data stored in a failed disk into a spare disk. Specifically, a block-based rebuild algorithm is studied with the rebuild time and buffer requirement modeled. A buffer-sharing scheme is then proposed to eliminate the additional buffers needed by the rebuild process. To further improve rebuild performance, a track-based rebuild algorithm that rebuilds lost data in tracks is proposed and analyzed. Results show that track-based rebuild, while it substantially outperforms block-based rebuild, requires significantly more buffers (17-135 percent more) even with buffer sharing. To tackle this problem, a novel pipelined rebuild algorithm is proposed to take advantage of the sequential property of track retrievals to pipeline the reading and writing processes. This pipelined rebuild algorithm achieves the same rebuild performance as track-based rebuild, but reduces the extra buffer requirement to insignificant levels (0.7-1.9 percent). Numerical results computed using models of five commercial disk drives demonstrate that automatic rebuild of a failed disk can be done in a reasonable amount of time, even at relatively high server utilization (e.g., less than 1.5 hours at 90 percent utilization).

[1]  Masaru Kitsuregawa,et al.  Hot mirroring: a method of hiding parity update penalty and degradation during rebuilds for RAID5 , 1996, SIGMOD '96.

[2]  Alexander Thomasian,et al.  RAID5 Performance with Distributed Sparing , 1997, IEEE Trans. Parallel Distributed Syst..

[3]  Yale N. Patt,et al.  Comparing rebuild algorithms for mirrored and RAID5 disk arrays , 1993, SIGMOD '93.

[4]  P. Venkat Rangan,et al.  Pipelined disk arrays for digital movie retrieval , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[5]  Jan H. M. Korst Random duplicated assignment: an alternative to striping in video servers , 1997, MULTIMEDIA '97.

[6]  P. Venkat Rangan,et al.  Multimedia Storage Servers: A Tutorial , 1995, Computer.

[7]  Alexander Thomasian,et al.  Performance analysis of RAIDS disk arrays with a vacationing server model for rebuild mode operation , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[8]  Fouad A. Tobagi,et al.  Streaming RAID: a disk array management system for video files , 1993, MULTIMEDIA '93.

[9]  Robert Y. Hou,et al.  Balancing I/O response time and disk rebuild time in a RAID5 disk array , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[10]  Philip S. Yu,et al.  Using rotational mirrored declustering for replica placement in a disk-array-based video server , 1997, Multimedia Systems.

[11]  Prashant J. Shenoy,et al.  Fault-tolerant architectures for continuous media servers , 1996, SIGMOD '96.

[12]  Jai Menon,et al.  Performance of disk arrays in transaction processing environments , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[13]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[14]  A. L. Narasimha Reddy,et al.  I/O issues in a multimedia system , 1994, Computer.

[15]  Thomas D. C. Little Editorial: Multimedia storage servers , 1995 .

[16]  Walter A. Burkhard,et al.  Segmented information dispersal (SID) for efficient reconstruction in fault-tolerant video servers , 1997, MULTIMEDIA '96.

[17]  Richard R. Muntz,et al.  Fault tolerant design of multimedia servers , 1995, SIGMOD '95.

[18]  H KatzRandy,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988 .

[19]  J. Menon,et al.  Distributed sparing in disk arrays , 1992, Digest of Papers COMPCON Spring 1992.

[20]  Alexander Thomasian Rebuild options in RAID5 disk arrays , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[21]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.