Practical scrubbing: Getting to the bad sector at the right time

Latent sector errors (LSEs) are a common hard disk failure mode, where disk sectors become inaccessible while the rest of the disk remains unaffected. To protect against LSEs, commercial storage systems use scrubbers: background processes verifying disk data. The efficiency of different scrubbing algorithms in detecting LSEs has been studied in depth; however, no attempts have been made to evaluate or mitigate the impact of scrubbing on application performance. We provide the first known evaluation of the performance impact of different scrubbing policies in implementation, including guidelines on implementing a scrubber. To lessen this impact, we present an approach giving conclusive answers to the questions: when should scrubbing requests be issued, and at what size, to minimize impact and maximize scrubbing throughput for a given workload. Our approach achieves six times more throughput, and up to three orders of magnitude less slowdown than the default Linux I/O scheduler.

[1]  J H Maindonald,et al.  Draft of Changes and Additions in a Projected 3rd Edition of Data Analysis and Graphics Using R , 2009 .

[2]  Gregory R. Ganger,et al.  A Framework for Building Unobtrusive Disk Maintenance Applications (CMU-CS-03-192) , 2004 .

[3]  Jeffrey R. Russell,et al.  Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data , 1998 .

[4]  Eitan Bachmat,et al.  Analysis of methods for scheduling low priority disk drive tasks , 2002, SIGMETRICS '02.

[5]  Steven Hetzler System Impacts of Storage Trends: Hard Errors and Testability , 2011, login Usenix Mag..

[6]  Nikolaus Hautsch,et al.  Modelling Irregularly Spaced Financial Data: Theory and Practice of Dynamic Duration Models , 2004 .

[7]  Erik Riedel,et al.  More Than an Interface - SCSI vs. ATA , 2003, FAST.

[8]  Edward Y. Chang,et al.  Systems support for preemptive disk scheduling , 2005, IEEE Transactions on Computers.

[9]  Carl Staelin,et al.  Idleness is Not Sloth , 1995, USENIX.

[10]  Qi Zhang,et al.  Efficient management of idleness in storage systems , 2009, TOS.

[11]  Shankar Pasupathy,et al.  An analysis of latent sector errors in disk drives , 2007, SIGMETRICS '07.

[12]  Alma Riska,et al.  Evaluation of disk-level workloads at different time-scales , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[13]  H. Akaike A new look at the statistical model identification , 1974 .

[14]  Ahmed Amer,et al.  Improving Disk Array Reliability Through Expedited Scrubbing , 2010, 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage.

[15]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[16]  Ari Juels,et al.  A Clean-Slate Look at Disk Scrubbing , 2010, FAST.

[17]  John H. Maindonald,et al.  Data Analysis and Graphics Using R – an Example-Based Approach: A review of inference concepts , 2010 .

[18]  Alma Riska,et al.  Disk Drive Level Workload Characterization , 2006, USENIX Annual Technical Conference, General Track.

[19]  John H. Maindonald,et al.  Comprar Data Analysis and Graphics Using R | John Maindonald | 9780521762939 | Cambridge University Press , 2010 .

[20]  Bianca Schroeder,et al.  Understanding latent sector errors and how to protect against them , 2010, TOS.

[21]  Nikolaus Hautsch,et al.  Modelling Irregularly Spaced Financial Data , 2004 .

[22]  Gregory R. Ganger,et al.  Freeblock Scheduling Outside of Disk Firmware , 2002, FAST.

[23]  Gregory R. Ganger,et al.  Argon: Performance Insulation for Shared Storage Servers , 2007, FAST.

[24]  John Maindonald,et al.  Data Analysis and Graphics Using R: Contents , 2006 .

[25]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[26]  Spencer W. Ng,et al.  Disk scrubbing in large archival storage systems , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[27]  Xin Li,et al.  Restrained utilization of idleness for transparent scheduling of background tasks , 2009, SIGMETRICS '09.