FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives

Modern solid-state drives (SSDs) use new host–interface protocols, such as NVMe, to provide applications with fast access to storage. These new protocols make use of a concept known as the multi-queue SSD (MQ-SSD), where the SSD has direct access to the application-level I/O request queues. This removes most of the OS software stack that was used in older protocols to control how and when the I/O requests were dispatched to storage devices. Unfortunately, while the elimination of the OS software stack leads to a significant performance improvement, we show in this paper that it introduces a new problem: unfairness. This is because the elimination of the OS software stack eliminates the mechanisms that were used to provide fairness among applications in older SSDs. To study application-level unfairness, we perform experiments using four real state-of-the-art MQ-SSDs. We demonstrate that the lack of fair scheduling mechanisms leads to high unfairness among concurrently-executing applications due to the interference among them. For instance, when one of these applications issues many more I/O requests than others, the other applications are slowed down significantly. We perform a comprehensive analysis of interference in real MQ-SSDs, and find four major interference sources: (1) the intensity of requests sent by each application, (2) differences in request access patterns, (3) the ratio of reads to writes, and (4) garbage collection. To alleviate unfairness in MQ-SSDs, we propose the Flash-Level INterference-aware scheduler (FLIN). FLIN is a lightweight I/O request scheduling mechanism that provides fairness among requests from different applications. FLIN uses a three-stage scheduling algorithm that protects against all four major sources of interference, while respecting the application-level priorities assigned by the host. FLIN is implemented fully within the SSD controller firmware, requiring no new hardware, and has negligible (<0.06%) storage cost. Compared to a state-of-the-art I/O scheduler, FLIN improves the fairness and performance of a wide range of enterprise and datacenter storage workloads, with an average improvement of 70% and 47%, respectively.

[1]  Sam H. Noh,et al.  I/O Scheduling Schemes for Better I/O Proportionality on Flash-Based SSDs , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[2]  Li-Pin Chang,et al.  Providing SLO Compliance on NVMe SSDs Through Parallelism Reservation , 2018, ACM Trans. Design Autom. Electr. Syst..

[3]  Sang Lyul Min,et al.  Ozone (O3): An Out-of-Order Flash Memory Controller Architecture , 2011, IEEE Transactions on Computers.

[4]  Yan Solihin,et al.  Non-volatile memory host controller interface performance analysis in high-performance I/O systems , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[5]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Jongman Kim,et al.  Preemptible I/O Scheduling of Garbage Collection for Solid State Drives , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Sungjin Lee,et al.  To collect or not to collect: Just-in-time garbage collection for high-performance SSDs with long lifetimes , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Kevin Kai-Wei Chang,et al.  Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[10]  Chita R. Das,et al.  Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Andrea C. Arpaci-Dusseau,et al.  Split-level I/O scheduling , 2015, SOSP.

[12]  Andrew Warfield,et al.  Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.

[13]  Hans Vandierendonck,et al.  Fairness Metrics for Multi-Threaded Processors , 2011, IEEE Computer Architecture Letters.

[14]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[16]  Mrinmoy Ghosh,et al.  Performance Characterization of Hyperscale Applicationson on NVMe SSDs , 2015, SIGMETRICS.

[17]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[18]  Kevin Kai-Wei Chang,et al.  DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators , 2016, ACM Trans. Archit. Code Optim..

[19]  Benny Van Houdt,et al.  On the necessity of hot and cold data identification to reduce the write amplification in flash-based SSDs , 2014, Perform. Evaluation.

[20]  Onur Mutlu,et al.  Distributed order scheduling and its application to multi-core dram controllers , 2008, PODC '08.

[21]  Jongmoo Choi,et al.  WARM: Improving NAND flash memory lifetime with write-hotness aware retention management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[22]  Xubin He,et al.  Reducing SSD read latency via NAND flash program and erase suspension , 2012, FAST.

[23]  Wei Jin,et al.  Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.

[24]  Kosuke Suzuki,et al.  A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014 , 2015, 2015 IEEE International Memory Workshop (IMW).

[25]  Suman Nath,et al.  FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs , 2017, FAST.

[26]  Osman S. Unsal,et al.  Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[27]  Jongmoo Choi,et al.  Disk schedulers for solid state drivers , 2009, EMSOFT '09.

[28]  Tei-Wei Kuo,et al.  Real-time garbage collection for flash-memory storage systems of real-time embedded systems , 2004, TECS.

[29]  Onur Mutlu,et al.  Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[30]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[31]  Luis Carlos Erpen De Bona,et al.  A QoS aware non-work-conserving disk scheduler , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Sangyeun Cho,et al.  The Multi-streamed Solid-State Drive , 2014, HotStorage.

[34]  Onur Mutlu,et al.  Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[35]  Peter Druschel,et al.  Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[36]  Kenneth C. Gilbert,et al.  MULTIDIMENSIONAL ASSIGNMENT PROBLEMS , 1988 .

[37]  Onur Mutlu,et al.  Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[38]  Jihong Kim,et al.  Improving I/O Resource Sharing of Linux Cgroup for NVMe SSDs on Multi-core Systems , 2016, HotStorage.

[39]  Jin-Soo Kim,et al.  NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs , 2016, HotStorage.

[40]  Alla R. Kammerdiner Multidimensional Assignment Problem , 2009, Encyclopedia of Optimization.

[41]  Kai Shen,et al.  FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.

[42]  Scott A. Brandt,et al.  Hierarchical disk sharing for multimedia systems , 2005, NOSSDAV '05.

[43]  Onur Mutlu,et al.  BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.

[44]  Björn Andersson,et al.  Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[45]  Mahmut T. Kandemir,et al.  Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[46]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[47]  Mahmut T. Kandemir,et al.  HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[48]  Onur Mutlu,et al.  Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[49]  Yiming Hu,et al.  Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD Performance , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[50]  Dongwoo Lee,et al.  Improving performance by bridging the semantic gap between multi-queue SSD and I/O virtualization framework , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[51]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[52]  Sam H. Noh,et al.  Towards SLO Complying SSDs Through OPS Isolation , 2015, FAST.

[53]  Jongman Kim,et al.  A semi-preemptive garbage collector for solid state drives , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[54]  Benny Van Houdt,et al.  A mean field model for a class of garbage collection algorithms in flash-based solid state drives , 2013, Queueing Systems.

[55]  Song Jiang,et al.  A Scheduling Framework That Makes Any Disk Schedulers Non-Work-Conserving Solely Based on Request Characteristics , 2011, FAST.

[56]  Onur Mutlu,et al.  HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[57]  Theodore P. Baker,et al.  Throttling On-Disk Schedulers to Meet Soft-Real-Time Requirements , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[58]  Le Thi Hoai An,et al.  Solving the Multidimensional Assignment Problem by a Cross-Entropy method , 2012, Journal of Combinatorial Optimization.

[59]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[60]  Dongkun Shin,et al.  Workload-aware budget compensation scheduling for NVMe solid state drives , 2015, 2015 IEEE Non-Volatile Memory System and Applications Symposium (NVMSA).

[61]  Anand Sivasubramaniam,et al.  Synthesizing Representative I/O Workloads for TPC-H , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[62]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[63]  Mingwei Lin,et al.  Efficient and intelligent garbage collection policy for NAND flash-based consumer electronics , 2013, IEEE Transactions on Consumer Electronics.

[64]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[65]  Mahmut T. Kandemir,et al.  Physically addressed queueing (PAQ): Improving parallelism in solid state disks , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[66]  Kai Shen,et al.  FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs , 2013, USENIX Annual Technical Conference.

[67]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[68]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[69]  Onur Mutlu,et al.  The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[70]  O. Mutlu,et al.  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[71]  John C. S. Lui,et al.  Stochastic modeling of large-scale solid-state storage systems: analysis, design tradeoffs and optimization , 2013, SIGMETRICS '13.

[72]  Walid G. Aref,et al.  Scalable QoS-aware disk-scheduling , 2002, Proceedings International Database Engineering and Applications Symposium.

[73]  Mrinmoy Ghosh,et al.  Performance analysis of NVMe SSDs and their implication on real world databases , 2015, SYSTOR.

[74]  Avi Mendelson,et al.  Fairness and Throughput in Switch on Event Multithreading , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[75]  John Nagle,et al.  On Packet Switches with Infinite Storage , 1985, IEEE Trans. Commun..

[76]  Steven Swanson,et al.  The bleak future of NAND flash memory , 2012, FAST.

[77]  Andrew A. Chien,et al.  Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs , 2017, FAST.

[78]  Hamid Sarbazi-Azad,et al.  Performance Evaluation of Dynamic Page Allocation Strategies in SSDs , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[79]  Chris Fallin,et al.  Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[80]  Fabio Checconi,et al.  High Throughput Disk Scheduling with Fair Bandwidth Distribution , 2010, IEEE Transactions on Computers.

[81]  O. Mutlu,et al.  Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory , 2016, IEEE Journal on Selected Areas in Communications.

[82]  Mohammad Arjomand,et al.  Exploiting Intra-Request Slack to Improve SSD Performance , 2017, ASPLOS.

[83]  Banu Özden,et al.  Disk scheduling with quality of service guarantees , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[84]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[85]  Mahmut T. Kandemir,et al.  Sprinkler: Maximizing resource utilization in many-chip solid state disks , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[86]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[87]  Jian Yang,et al.  Architecting Flash-based Solid-State Drive for High-performance I/O Virtualization , 2014, IEEE Computer Architecture Letters.

[88]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[89]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[90]  Onur Mutlu,et al.  MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices , 2018, FAST.

[91]  Joonwon Lee,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[92]  Quan Zhang,et al.  An Efficient, QoS-Aware I/O Scheduler for Solid State Drive , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.