论文信息 - FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives

FLIN: Enabling Fairness and Enhancing Performance in Modern NVMe Solid State Drives

Modern solid-state drives (SSDs) use new host–interface protocols, such as NVMe, to provide applications with fast access to storage. These new protocols make use of a concept known as the multi-queue SSD (MQ-SSD), where the SSD has direct access to the application-level I/O request queues. This removes most of the OS software stack that was used in older protocols to control how and when the I/O requests were dispatched to storage devices. Unfortunately, while the elimination of the OS software stack leads to a significant performance improvement, we show in this paper that it introduces a new problem: unfairness. This is because the elimination of the OS software stack eliminates the mechanisms that were used to provide fairness among applications in older SSDs. To study application-level unfairness, we perform experiments using four real state-of-the-art MQ-SSDs. We demonstrate that the lack of fair scheduling mechanisms leads to high unfairness among concurrently-executing applications due to the interference among them. For instance, when one of these applications issues many more I/O requests than others, the other applications are slowed down significantly. We perform a comprehensive analysis of interference in real MQ-SSDs, and find four major interference sources: (1) the intensity of requests sent by each application, (2) differences in request access patterns, (3) the ratio of reads to writes, and (4) garbage collection. To alleviate unfairness in MQ-SSDs, we propose the Flash-Level INterference-aware scheduler (FLIN). FLIN is a lightweight I/O request scheduling mechanism that provides fairness among requests from different applications. FLIN uses a three-stage scheduling algorithm that protects against all four major sources of interference, while respecting the application-level priorities assigned by the host. FLIN is implemented fully within the SSD controller firmware, requiring no new hardware, and has negligible (<0.06%) storage cost. Compared to a state-of-the-art I/O scheduler, FLIN improves the fairness and performance of a wide range of enterprise and datacenter storage workloads, with an average improvement of 70% and 47%, respectively.

[1] Sam H. Noh,et al. I/O Scheduling Schemes for Better I/O Proportionality on Flash-Based SSDs , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[2] Li-Pin Chang,et al. Providing SLO Compliance on NVMe SSDs Through Parallelism Reservation , 2018, ACM Trans. Design Autom. Electr. Syst..

[3] Sang Lyul Min,et al. Ozone (O3): An Out-of-Order Flash Memory Controller Architecture , 2011, IEEE Transactions on Computers.

[4] Yan Solihin,et al. Non-volatile memory host controller interface performance analysis in high-performance I/O systems , 2015, 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[5] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6] Jongman Kim,et al. Preemptible I/O Scheduling of Garbage Collection for Solid State Drives , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7] Sungjin Lee,et al. To collect or not to collect: Just-in-time garbage collection for high-performance SSDs with long lifetimes , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[8] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9] Sivan Toledo,et al. Algorithms and data structures for flash memories , 2005, CSUR.

[10] Chita R. Das,et al. Application-aware prioritization mechanisms for on-chip networks , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11] Andrea C. Arpaci-Dusseau,et al. Split-level I/O scheduling , 2015, SOSP.

[12] Andrew Warfield,et al. Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.

[13] Hans Vandierendonck,et al. Fairness Metrics for Multi-Threaded Processors , 2011, IEEE Computer Architecture Letters.

[14] Sai Prashanth Muralidhara,et al. Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[16] Mrinmoy Ghosh,et al. Performance Characterization of Hyperscale Applicationson on NVMe SSDs , 2015, SIGMETRICS.

[17] Onur Mutlu,et al. MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[18] Kevin Kai-Wei Chang,et al. DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators , 2016, ACM Trans. Archit. Code Optim..

[19] Benny Van Houdt,et al. On the necessity of hot and cold data identification to reduce the write amplification in flash-based SSDs , 2014, Perform. Evaluation.

[20] Onur Mutlu,et al. Distributed order scheduling and its application to multi-core dram controllers , 2008, PODC '08.

[21] Jongmoo Choi,et al. WARM: Improving NAND flash memory lifetime with write-hotness aware retention management , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[22] Xubin He,et al. Reducing SSD read latency via NAND flash program and erase suspension , 2012, FAST.

[23] Wei Jin,et al. Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.

[24] Kosuke Suzuki,et al. A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014 , 2015, 2015 IEEE International Memory Workshop (IMW).

[25] Suman Nath,et al. FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs , 2017, FAST.

[26] Osman S. Unsal,et al. Neighbor-cell assisted error correction for MLC NAND flash memories , 2014, SIGMETRICS '14.

[27] Jongmoo Choi,et al. Disk schedulers for solid state drivers , 2009, EMSOFT '09.

[28] Tei-Wei Kuo,et al. Real-time garbage collection for flash-memory storage systems of real-time embedded systems , 2004, TECS.

[29] Onur Mutlu,et al. Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery , 2017, ArXiv.

[30] Antony I. T. Rowstron,et al. Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[31] Luis Carlos Erpen De Bona,et al. A QoS aware non-work-conserving disk scheduler , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[32] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[33] Sangyeun Cho,et al. The Multi-streamed Solid-State Drive , 2014, HotStorage.

[34] Onur Mutlu,et al. Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[35] Peter Druschel,et al. Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O , 2001, SOSP.

[36] Kenneth C. Gilbert,et al. MULTIDIMENSIONAL ASSIGNMENT PROBLEMS , 1988 .

[37] Onur Mutlu,et al. Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems , 2007, USENIX Security Symposium.

[38] Jihong Kim,et al. Improving I/O Resource Sharing of Linux Cgroup for NVMe SSDs on Multi-core Systems , 2016, HotStorage.

[39] Jin-Soo Kim,et al. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs , 2016, HotStorage.

[40] Alla R. Kammerdiner. Multidimensional Assignment Problem , 2009, Encyclopedia of Optimization.

[41] Kai Shen,et al. FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.

[42] Scott A. Brandt,et al. Hierarchical disk sharing for multimedia systems , 2005, NOSSDAV '05.

[43] Onur Mutlu,et al. BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.

[44] Björn Andersson,et al. Bounding memory interference delay in COTS-based multi-core systems , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[45] Mahmut T. Kandemir,et al. Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[46] Mor Harchol-Balter,et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[47] Mahmut T. Kandemir,et al. HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[48] Onur Mutlu,et al. Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives , 2017, Proceedings of the IEEE.

[49] Yiming Hu,et al. Parallelism and Garbage Collection Aware I/O Scheduler with Improved SSD Performance , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[50] Dongwoo Lee,et al. Improving performance by bridging the semantic gap between multi-queue SSD and I/O virtualization framework , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[51] Onur Mutlu,et al. Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[52] Sam H. Noh,et al. Towards SLO Complying SSDs Through OPS Isolation , 2015, FAST.

[53] Jongman Kim,et al. A semi-preemptive garbage collector for solid state drives , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[54] Benny Van Houdt,et al. A mean field model for a class of garbage collection algorithms in flash-based solid state drives , 2013, Queueing Systems.

[55] Song Jiang,et al. A Scheduling Framework That Makes Any Disk Schedulers Non-Work-Conserving Solely Based on Request Characteristics , 2011, FAST.

[56] Onur Mutlu,et al. HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[57] Theodore P. Baker,et al. Throttling On-Disk Schedulers to Meet Soft-Real-Time Requirements , 2008, 2008 IEEE Real-Time and Embedded Technology and Applications Symposium.

[58] Le Thi Hoai An,et al. Solving the Multidimensional Assignment Problem by a Cross-Entropy method , 2012, Journal of Combinatorial Optimization.

[59] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[60] Dongkun Shin,et al. Workload-aware budget compensation scheduling for NVMe solid state drives , 2015, 2015 IEEE Non-Volatile Memory System and Applications Symposium (NVMSA).

[61] Anand Sivasubramaniam,et al. Synthesizing Representative I/O Workloads for TPC-H , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[62] Philippe Bonnet,et al. Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[63] Mingwei Lin,et al. Efficient and intelligent garbage collection policy for NAND flash-based consumer electronics , 2013, IEEE Transactions on Consumer Electronics.

[64] Onur Mutlu,et al. Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[65] Mahmut T. Kandemir,et al. Physically addressed queueing (PAQ): Improving parallelism in solid state disks , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[66] Kai Shen,et al. FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs , 2013, USENIX Annual Technical Conference.

[67] Antony I. T. Rowstron,et al. Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[68] Anees Shaikh,et al. Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[69] Onur Mutlu,et al. The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[70] O. Mutlu,et al. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS XV.

[71] John C. S. Lui,et al. Stochastic modeling of large-scale solid-state storage systems: analysis, design tradeoffs and optimization , 2013, SIGMETRICS '13.

[72] Walid G. Aref,et al. Scalable QoS-aware disk-scheduling , 2002, Proceedings International Database Engineering and Applications Symposium.

[73] Mrinmoy Ghosh,et al. Performance analysis of NVMe SSDs and their implication on real world databases , 2015, SYSTOR.

[74] Avi Mendelson,et al. Fairness and Throughput in Switch on Event Multithreading , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[75] John Nagle,et al. On Packet Switches with Infinite Storage , 1985, IEEE Trans. Commun..

[76] Steven Swanson,et al. The bleak future of NAND flash memory , 2012, FAST.

[77] Andrew A. Chien,et al. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs , 2017, FAST.

[78] Hamid Sarbazi-Azad,et al. Performance Evaluation of Dynamic Page Allocation Strategies in SSDs , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[79] Chris Fallin,et al. Parallel application memory scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[80] Fabio Checconi,et al. High Throughput Disk Scheduling with Fair Bandwidth Distribution , 2010, IEEE Transactions on Computers.

[81] O. Mutlu,et al. Enabling Accurate and Practical Online Flash Channel Modeling for Modern MLC NAND Flash Memory , 2016, IEEE Journal on Selected Areas in Communications.

[82] Mohammad Arjomand,et al. Exploiting Intra-Request Slack to Improve SSD Performance , 2017, ASPLOS.

[83] Banu Özden,et al. Disk scheduling with quality of service guarantees , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[84] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[85] Mahmut T. Kandemir,et al. Sprinkler: Maximizing resource utilization in many-chip solid state disks , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[86] Onur Mutlu,et al. The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[87] Jian Yang,et al. Architecting Flash-based Solid-State Drive for High-performance I/O Virtualization , 2014, IEEE Computer Architecture Letters.

[88] Qi Zhang,et al. Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[89] Irfan Ahmad,et al. PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[90] Onur Mutlu,et al. MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices , 2018, FAST.

[91] Joonwon Lee,et al. Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[92] Quan Zhang,et al. An Efficient, QoS-Aware I/O Scheduler for Solid State Drive , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.