Deconstructing storage arrays

We introduce Shear, a user-level software tool that characterizes RAID storage arrays. Shear employs a set of controlled algorithms combined with statistical techniques to automatically determine the important properties of a RAID system, including the number of disks, chunk size, level of redundancy, and layout scheme. We illustrate the correctness of Shear by running it upon numerous simulated configurations, and then verify its real-world applicability by running Shear on both software-based and hardware-based RAID systems. Finally, we demonstrate the utility of Shear through three case studies. First, we show how Shear can be used in a storage management environment to verify RAID construction and detect failures. Second, we demonstrate how Shear can be used to extract detailed characteristics about the individual disks within an array. Third, we show how an operating system can use Shear to automatically tune its storage subsystems to specific RAID configurations.

[1]  Andrea C. Arpaci-Dusseau,et al.  Proceedings of the 2002 Usenix Annual Technical Conference Bridging the Information Gap in Storage Protocol Stacks , 2022 .

[2]  Gregory R. Ganger,et al.  Towards higher disk head utilization: extracting free bandwidth from busy disk drives , 2000, OSDI.

[3]  Yale N. Patt,et al.  On-line extraction of SCSI disk drive parameters , 1995, SIGMETRICS '95/PERFORMANCE '95.

[4]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[5]  Carl Staelin,et al.  Mhz: Anatomy of a Micro-benchmark , 1998, USENIX Annual Technical Conference.

[6]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[7]  David A. Patterson,et al.  Maximizing performance in a striped disk array , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[8]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[9]  John Regehr,et al.  Inferring Scheduling Behavior with Hourglass , 2002, USENIX Annual Technical Conference, FREENIX Track.

[10]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[11]  Arif Merchant,et al.  Issues and challenges in the performance analysis of real disk arrays , 2004, IEEE Transactions on Parallel and Distributed Systems.

[12]  Remzi H. Arpaci-Dusseau,et al.  Micro-Benchmark Based Extraction of Local and Global Disk , 2000 .

[13]  Andrea C. Arpaci-Dusseau,et al.  Exploiting Gray-Box Knowledge of Buffer-Cache Contents , 2002 .

[14]  Randy H. Katz,et al.  Performance consequences of parity placement in disk arrays , 1991, ASPLOS IV.

[15]  Remzi H. Arpaci-Dusseau,et al.  Microbenchmark-based Extraction of Local and Global Disk Characteristics , 1999 .

[16]  Peter M. Chen,et al.  Striping in a RAID level 5 disk array , 1995, SIGMETRICS '95/PERFORMANCE '95.

[17]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[18]  Gregory R. Ganger,et al.  Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics , 2002, FAST.

[19]  David A. Patterson,et al.  A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance , 1994, TOCS.

[20]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[21]  Edward Grochowski,et al.  Emerging Trends in Data Storage on Magnetic Hard Disk Drives , 1999 .

[22]  Gregory R. Ganger,et al.  Automated Disk Drive Characterization , 1999 .

[23]  Anastasia Ailamaki,et al.  Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks , 2004, FAST.

[24]  Sally Floyd,et al.  Identifying the tcp behavior of web servers , 2000, SIGCOMM 2000.

[25]  Stefan Savage,et al.  AFRAID - A Frequently Redundant Array of Independent Disks , 1996, USENIX Annual Technical Conference.

[26]  Alan Jay Smith,et al.  Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.

[27]  Andrea C. Arpaci-Dusseau,et al.  Exploiting Gray-Box Knowledge of Buffer-Cache Management , 2002, USENIX Annual Technical Conference, General Track.