Automatic design of storage systems to meet availability requirements

As storage systems continue to grow (due to the introduction of new IO interconnect such as FibreChannel), their design and configuration continues to be a costly and difficult task which involves complex trade-offs in cost, performance and availability. We focus on the problem of designing large storage systems to meet workload availability requirements (we use the term “workload unit” to refer to an object and the streams that access that object). We propose to reduce the difficulty of this problem by having end-users specify the availability requirements of workload units to the storage system and let the system configure itself to meet these requirements. We demonstrate the feasibility of this proposition by describing an approach and developing a working tool to automatically design storage systems to meet the availability requirements of a large set of workload units. The tool automatically synthesizes all candidate storage logical units that match the input workload units and assesses their reliability, availability and performance via automatically generated Markov chains. An Assignment engine selects the appropriate storage logical units and determines the assignment of workload units to storage logical units so as to minimize total storage system cost.

[1]  M.D. Beaudry,et al.  PERFORMANCE RELATED RELIABILITY MEASURES FOR COMPUTING SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[2]  Randy H. Katz,et al.  How reliable is a RAID? , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[3]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[4]  John F. Meyer,et al.  On Evaluating the Performability of Degradable Computing Systems , 1980, IEEE Transactions on Computers.

[5]  Elizabeth Shriver,et al.  Attribute-managed storage , 1995 .

[6]  David A. Patterson,et al.  Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..

[7]  Kishor S. Trivedi,et al.  Analysis of Typical Fault-Tolerant Architectures using HARP , 1987, IEEE Transactions on Reliability.

[8]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[9]  Salim Hariri,et al.  Hierarchical Modeling of Availability in Distributed Systems , 1995, IEEE Trans. Software Eng..

[10]  Raymie Stata,et al.  Specifying data availability in multi-device file systems , 1990, OPSR.

[11]  Garth A. Gibson,et al.  Parity declustering for continuous operation in redundant disk arrays , 1992, ASPLOS V.

[12]  Kishor S. Trivedi,et al.  Reliability Analysis of Redundant Arrays of Inexpensive Disks , 1993, J. Parallel Distributed Comput..

[13]  Krishna R. Pattipati,et al.  A Unified Framework for the Performability Evaluation of Fault-Tolerant Computer Systems , 1993, IEEE Trans. Computers.

[14]  Stefan Savage,et al.  AFRAID - A Frequently Redundant Array of Independent Disks , 1996, USENIX Annual Technical Conference.