NCQ vs. I/O scheduler: Preventing unexpected misbehaviors

Native Command Queueing (NCQ) is an optimization technology to maximize throughput by reordering requests inside a disk drive. It has been so successful that NCQ has become the standard in SATA 2 protocol specification, and the great majority of disk vendors have adopted it for their recent disks. However, there is a possibility that the technology may lead to an information gap between the OS and a disk drive. A NCQ-enabled disk tries to optimize throughput without realizing the intention of an OS, whereas the OS does its best under the assumption that the disk will do as it is told without specific knowledge regarding the details of the disk mechanism. Let us call this expectation discord, which may cause serious problems such as request starvations or performance anomaly. In this article, we (1) confirm that expectation discord actually occurs in real systems; (2) propose software-level approaches to solve them; and (3) evaluate our mechanism. Experimental results show that our solution is simple, cheap (no special hardware required), portable, and effective.

[1]  Chandra Krintz,et al.  AutoDVS: an automatic, general-purpose, dynamic clock scheduling system for hand-held devices , 2005, EMSOFT.

[2]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[3]  B. Dees Native command queuing - advanced performance in desktop storage , 2005, IEEE Potentials.

[4]  Nikolai Joukov,et al.  Auto-pilot: A Platform for System Software Benchmarking , 2005, USENIX Annual Technical Conference, FREENIX Track.

[5]  H. Garcia-Molina,et al.  Scheduling I/O requests with deadlines: A performance evaluation , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[6]  Gregory R. Ganger,et al.  Object-based storage , 2003, IEEE Commun. Mag..

[7]  David A. Patterson,et al.  Virtual log based file systems for a programmable disk , 1999, OSDI '99.

[8]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Next Generation Operating Systems* , 1998, SIGMETRICS '98/PERFORMANCE '98.

[9]  Peter J. Varman,et al.  Efficient and adaptive proportional share I/O scheduling , 2009, PERV.

[10]  Nikolai Joukov,et al.  A nine year study of file system and storage benchmarking , 2008, TOS.

[11]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Bext Generation Operating Systems , 1998, SIGMETRICS.

[12]  Donald F. Towsley,et al.  Performance evaluation of two new disk scheduling algorithms for real-time systems , 2004, Real-Time Systems.

[13]  Kai Shen,et al.  Competitive prefetching for concurrent sequential I/O , 2007, EuroSys '07.

[14]  Banu Özden,et al.  Disk scheduling with quality of service guarantees , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[15]  Christos Faloutsos,et al.  Active Storage for Large-Scale Data Mining and Multimedia , 1998, VLDB.

[16]  Heon Young Yeom,et al.  Shedding Light in the Black-Box : Structural Modeling of Modern Disk Drives , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[17]  Gregory R. Ganger,et al.  Towards higher disk head utilization: extracting free bandwidth from busy disk drives , 2000, OSDI.

[18]  Matthew Wilcox,et al.  Enhancements to Linux I/O Scheduling , 2005 .

[19]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[20]  Yongdai Kim,et al.  Intelligent storage: Cross-layer optimization for soft real-time workload , 2006, TOS.

[21]  Alan L. Cox,et al.  Scheduling I/O in virtual machine monitors , 2008, VEE '08.

[22]  John Wilkes,et al.  Disk scheduling algorithms based on rotational position , 1991 .

[23]  Andrea C. Arpaci-Dusseau,et al.  Antfarm: Tracking Processes in a Virtual Machine Environment , 2006, USENIX Annual Technical Conference, General Track.

[24]  Kanishk Jain Object-based Storage , 2022 .

[25]  Christos Faloutsos,et al.  Data Mining on an OLTP System (Nearly) for Free (CMU-CS-99-151) , 2000, SIGMOD 2000.

[26]  Remzi H. Arpaci-Dusseau,et al.  Micro-Benchmark Based Extraction of Local and Global Disk , 2000 .

[27]  Remzi H. Arpaci-Dusseau,et al.  Microbenchmark-based Extraction of Local and Global Disk Characteristics , 1999 .

[28]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[29]  Mor Harchol-Balter,et al.  Priority mechanisms for OLTP and transactional Web applications , 2004, Proceedings. 20th International Conference on Data Engineering.

[30]  X Liang I/O Prioritization in Windows Vista、Windows 7 , 2011, CIT 2011.

[31]  Wilson C. Hsieh,et al.  The logical disk: a new approach to improving file systems , 1994, SOSP '93.

[32]  Andrea C. Arpaci-Dusseau,et al.  Proceedings of the 2002 Usenix Annual Technical Conference Bridging the Information Gap in Storage Protocol Stacks , 2022 .

[33]  Gregory R. Ganger,et al.  Blurring the Line Between Oses and Storage Devices (CMU-CS-01-166) , 2001 .

[34]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[35]  Yale N. Patt,et al.  On-line extraction of SCSI disk drive parameters , 1995, SIGMETRICS '95/PERFORMANCE '95.

[36]  Christos Faloutsos,et al.  Data mining on an OLTP system (nearly) for free , 2000, SIGMOD '00.

[37]  Seetharami R. Seelam,et al.  Enhancements to Linux I/O Scheduling , 2005 .

[38]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[39]  Miron Livny,et al.  Priority in DBMS Resource Scheduling , 1989, VLDB.

[40]  Martin Pohlack,et al.  Rotational-position-aware real-time disk scheduling using a dynamic active subset (DAS) , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[41]  Luis Angel D. Bathen,et al.  AMP: Adaptive Multi-stream Prefetching in a Shared Cache , 2007, FAST.

[42]  Carl A. Waldspurger,et al.  Lottery and stride scheduling: flexible proportional-share resource management , 1995 .

[43]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[44]  Philippe Bonnet,et al.  Getting Priorities Straight: Improving Linux Support for Database I/O , 2005, VLDB.

[45]  Carlos Maltzahn,et al.  Virtualizing Disk Performance , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[46]  Andrea C. Arpaci-Dusseau,et al.  Semantically-Smart Disk Systems , 2003, FAST.

[47]  Erez Zadok,et al.  Type-safe disks , 2006, OSDI '06.

[48]  Arif Merchant,et al.  TaP: Table-based Prefetching for Storage Caches , 2008, FAST.