On-Line Data Reconstruction in Redundant Disk Arrays (CMU-CS-94-164)

To meet the bandwidth needs of modern computer systems, parallel storage systems are evolving beyond RAID levels 1 through 5. The Parallel Data Lab at Carnegie Mellon University has constructed three Scotch parallel storage testbeds to explore and evaluate five directions in RAID evolution: first, the development of new RAID architectures to reduce the cost/performance penalty of maintaining redundant data; second, an extensible software framework for rapid prototyping of new architectures; third, mechanisms to reduce the complexity of and automate error-handling in RAID subsystems; fourth, a file system extension that allows serial programs to exploit parallel storage; and lastly, a parallel file system that extends the RAID advantages to distributed, parallel computing environments. This paper describes these five RAID evolutions and the testbeds in which they are being implemented and evaluated.

[1]  Haim Hanani,et al.  Balanced incomplete block designs and related designs , 1975, Discret. Math..

[2]  H. T. Kung Memory requirements for balanced computer architectures , 1986, ISCA '86.

[3]  A.Y.C. Yu,et al.  Microprocessor technology trends , 1986, Proceedings of the IEEE.

[4]  Daniel M. Dias,et al.  Disk Mirroring with Alternating Deferred Updates , 1993, VLDB.

[5]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[6]  Michael Stonebraker,et al.  Distributed RAID-a new multiple copy algorithm , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[7]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[8]  Randy H. Katz,et al.  An evaluation of redundant arrays of disks using an Amdahl 5890 , 1990, SIGMETRICS '90.

[9]  D. McKeown MAPS: The Organization of a Spatial Database System Using Imagery, Terrain, and Map Data , 1983 .

[10]  Mark D. Hill,et al.  A Unified Formalization of Four Shared-Memory Models , 1993, IEEE Trans. Parallel Distributed Syst..

[11]  Edward K. Lee Software and Performance Issues in the Implementation of a RAID Prototype , 1990 .

[12]  Mahadev Satyanarayanan,et al.  Scale and performance in a distributed file system , 1988, TOCS.

[13]  Prithviraj Banerjee,et al.  Gracefully degradable disk arrays , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[14]  Philip S. Yu,et al.  Design and modeling of clustered RAID , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[15]  Michael Stonebraker,et al.  A project on high performance I/0 subsystems , 1989, CARN.

[16]  David A. Patterson,et al.  Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..

[17]  R. Chien,et al.  Error-Correcting Codes, Second Edition , 1973, IEEE Transactions on Communications.

[18]  Garth A. Gibson,et al.  Exposing I/O concurrency with informed prefetching , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[19]  Carl Staelin,et al.  An Implementation of a Log-Structured File System for UNIX , 1993, USENIX Winter.

[20]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[21]  David Kotz,et al.  Integrating Theory and Practice in Parallel File Systems , 1993 .

[22]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[23]  Cyril U. Orji,et al.  Write-only disk caches , 1990, SIGMOD '90.

[24]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[25]  M. E. Zosel High performance Fortran: an overview , 1993, Digest of Papers. Compcon Spring.

[26]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[27]  Cyril U. Orji,et al.  Distorted mirrors , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[28]  Jehoshua Bruck,et al.  EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures , 1994, ISCA '94.

[29]  Dror G. Feitelson,et al.  Design and implementation of the Vesta parallel file system , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[30]  Jai Menon,et al.  The architecture of a fault-tolerant cached RAID controller , 1993, ISCA '93.

[31]  Shivakumar Venkataraman,et al.  The TickerTAIP parallel RAID architecture , 1993, ISCA '93.

[32]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[33]  Jai Menon,et al.  Performance of disk arrays in transaction processing environments , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[34]  J. Menon,et al.  Methods for improved update performance of disk arrays , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[35]  Daniel Stodolsky,et al.  Parity logging overcoming the small write problem in redundant disk arrays , 1993, ISCA '93.

[36]  George R. Santana,et al.  A Quarter Century of Disk File Innovation , 1981, IBM J. Res. Dev..

[37]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[38]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[39]  H. T. Kung,et al.  Network-based multicomputers: an emerging parallel architecture , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[40]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[41]  Marshall Hall,et al.  Combinatorial Theory, 2nd Edition , 1967 .

[42]  Randy H. Katz,et al.  Performance consequences of parity placement in disk arrays , 1991, ASPLOS IV.

[43]  P.S. Yu,et al.  Performance analysis of a dual striping strategy for replicated disk arrays , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[44]  Daniel P. Siewiorek,et al.  Fast, on-line failure recovery in redundant disk arrays , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[45]  Mahadev Satyanarayanan,et al.  Coda: A Highly Available File System for a Distributed Workstation Environment , 1990, IEEE Trans. Computers.

[46]  Dina Bitton,et al.  Arm scheduling in shadowed disks , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[47]  Jeffrey P. Buzen,et al.  A unified operational treatment of RPS reconnect delays , 1987, SIGMETRICS '87.

[48]  Walter A. Burkhard,et al.  Disk array storage system reliability , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[49]  Srinivasan Seshan,et al.  RAID-II: Design and implementation of a large scale disk array controller , 1992 .

[50]  C. Wood,et al.  DASD trends: cost, performance, and form factor , 1993, Proc. IEEE.

[51]  David A. Patterson,et al.  Maximizing performance in a striped disk array , 1990, ISCA '90.

[52]  Peter Dibble,et al.  A parallel interleaved file system , 1990 .

[53]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[54]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[55]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[56]  Michael Stonebraker,et al.  An overview of the Sequoia 2000 project , 1992, Digest of Papers COMPCON Spring 1992.

[57]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[58]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[59]  P. Venkat Rangan,et al.  Efficient Storage Techniques for Digital Continuous Multimedia , 1993, IEEE Trans. Knowl. Data Eng..

[60]  Randy H. Katz,et al.  How reliable is a RAID? , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[61]  John H. Hartman,et al.  The Zebra striped network file system , 1993, SOSP '93.

[62]  Michelle Y. Kim,et al.  Synchronized Disk Interleaving , 1986, IEEE Transactions on Computers.

[63]  Garth A. Gibson,et al.  Backward Error Recovery in Redundant Disk Arrays , 1994, Int. CMG Conference.

[64]  Robert Y. Hou,et al.  Balancing I/O response time and disk rebuild time in a RAID5 disk array , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[65]  Tom W. Keller,et al.  A comparison of high-availability media recovery techniques , 1989, SIGMOD '89.

[66]  Cyril U. Orji,et al.  Doubly distorted mirrors , 1993, SIGMOD '93.

[67]  Spencer W. Ng,et al.  Maintaining good performance in disk arrays during failure via uniform parity group distribution , 1992, Proceedings of the First International Symposium on High-Performance Distributed Computing. (HPDC-1).

[68]  Mahadev Satyanarayanan,et al.  Disconnected operation in the Coda File System , 1992, TOCS.

[69]  Darrell D. E. Long,et al.  Swift: Using Distributed Disk Striping to Provide High I/O Data Rates , 1991, Comput. Syst..

[70]  Randy H. Katz,et al.  An analytic performance model of disk arrays , 1993, SIGMETRICS '93.

[71]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[72]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[73]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[74]  K. K. Ramakrishnan,et al.  Analysis of file I/O traces in commercial computing environments , 1992, SIGMETRICS '92/PERFORMANCE '92.

[75]  Gordon Bell,et al.  The future of high performance computers in science and engineering , 1989, CACM.

[76]  Alok N. Choudhary,et al.  High-performance I/O for massively parallel computers: problems and prospects , 1994, Computer.

[77]  P.P. Gelsinger,et al.  Microprocessors circa 2000 , 1989, IEEE Spectrum.