A Novel Cost-Effective Disk Scrubbing Scheme

a distinct benefit of disk scanning or scrubbing operation is identifying the potential failure sectors as early as possible, thus providing high reliability. Obviously, the higher the scrubbing frequency is, the higher the system reliability is. However, it may take a few hours for a scanning process to check the whole disk. In other words, the scrubbing process may result in a downtime or a lower system performance. Furthermore, the scrubbing process consumes energy. In order to reduce the impact of disk scrubbing on disk performance and energy consumption, system designers choose to scan the disk in a low frequency, which results in a lower reliability. Additionally, conventional disk scrubbing schemes assume that the disk failure rate is constant, while the recent researches[2][6] show the disk failure rate is more complex. In this paper, we present a novel scrubbing scheme to solve the above challenges. In the scheme, an optimum scrubbing cycle is decided by keeping a balance between data loss cost, scrubbing cost, and disk failure rate. Our research shows that the scrubbing scheme is applicable for storage with low-capacity disk and inexpensive data.

[1]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[2]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004 .

[3]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[4]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[5]  Christos Faloutsos,et al.  Using Utility to Provision Storage Systems , 2008, FAST.

[6]  Eric Anderson,et al.  Quickly finding near-optimal storage designs , 2005, TOCS.

[7]  Mary Baker,et al.  A fresh look at the reliability of long-term digital storage , 2005, EuroSys.

[8]  Thomas J. E. Schwarz Verification of Parity Data in Large Scale Storage Systems , 2004, PDPTA.

[9]  Joel L. Wolf,et al.  The placement optimization program: a practical solution to the disk file assignment problem , 1989, SIGMETRICS '89.

[10]  Alvin AuYoung,et al.  Service contracts and aggregate utility functions , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.

[11]  Hannu H. Kari Latent Sector Faults and Reliability of Disk Arrays , 2005 .

[12]  Janak H. Patel,et al.  Reliability of scrubbing recovery-techniques for memory systems , 1990 .

[13]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[14]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[15]  Gregory R. Ganger,et al.  Modeling the relative fitness of storage , 2007, SIGMETRICS '07.

[16]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[17]  Ram Swaminathan,et al.  Ergastulum: Quickly fi nding near-optimal storage system designs , 2001 .

[18]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[19]  Jeffrey O. Kephart,et al.  An artificial intelligence perspective on autonomic computing policies , 2004, Proceedings. Fifth IEEE International Workshop on Policies for Distributed Systems and Networks, 2004. POLICY 2004..

[20]  Dirk Beyer,et al.  Designing for Disasters , 2004, FAST.

[21]  Evangelos Eleftheriou,et al.  Disk scrubbing versus intra-disk redundancy for high-reliability raid storage systems , 2008, SIGMETRICS '08.

[22]  Eitan Bachmat,et al.  Analysis of methods for scheduling low priority disk drive tasks , 2002, SIGMETRICS '02.

[23]  Khalil Amiri,et al.  Automatic design of storage systems to meet availability requirements , 1996 .

[24]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[25]  Ajay Dholakia,et al.  A new intra-disk redundancy scheme for high-reliability RAID storage systems in the presence of unrecoverable errors , 2006, TOS.

[26]  J. Sikora Disk failures in the real world : What does an MTTF of 1 , 000 , 000 hours mean to you ? , 2007 .

[27]  Rajarshi Das,et al.  Utility functions in autonomic systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..