OS I/O Path Optimizations for Flash Solid-state Drives

In this paper, we present OS I/O path optimizations for NAND flash solid-state drives, aimed to minimize scheduling delays caused by additional contexts such as interrupt bottom halves and background queue runs. With our optimizations, these contexts are eliminated and merged into hardware interrupts or I/O participating threads without introducing side effects. This was achieved by pipelining fine grained host controller operations with the cooperation of I/O participating threads. To safely expose fine grained host controller operations to upper layers, we present a low level hardware abstraction layer interface. Evaluations with micro-benchmarks showed that our optimizations were capable of accommodating up to five, AHCI controller attached, SATA 3.0 SSD devices at 671k IOPS, while current Linux SCSI based I/O path was limited at 354k IOPS failing to accommodate more than three devices. Evaluation on an SSD backed key value system also showed IOPS improvement using our I/O optimizations.

[1]  Ken Takeuchi,et al.  NAND flash aware data management system for high-speed SSDs by garbage collection overhead suppression , 2014, 2014 IEEE 6th International Memory Workshop (IMW).

[2]  P. Chenna Reddy,et al.  Critical analysis of Cross-layer approach , 2015, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT).

[3]  Shuai Li,et al.  LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[4]  M. Motani,et al.  Cross-layer design: a survey and the road ahead , 2005, IEEE Communications Magazine.

[5]  Sang-Won Lee,et al.  X-FTL: transactional FTL for SQLite databases , 2013, SIGMOD '13.

[6]  Jihong Kim,et al.  Application-Managed Flash , 2016, FAST.

[7]  Wolfgang Lehner,et al.  A high-throughput in-memory index, durable on flash-based SSD: insights into the winning solution of the SIGMOD programming contest 2011 , 2012, SGMD.

[8]  David J. Lilja,et al.  High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Mahmut T. Kandemir,et al.  HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[10]  Andrea C. Arpaci-Dusseau,et al.  De-indirection for flash-based SSDs with nameless writes , 2012, FAST.

[11]  Mark Handley,et al.  From protocol stack to protocol heap: role-based architecture , 2003, CCRV.

[12]  Edwin Hsing-Mean Sha,et al.  Exploiting parallelism in I/O scheduling for access conflict minimization in flash-based solid state drives , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Joonwon Lee,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007, J. Syst. Archit..

[14]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Sangyeun Cho,et al.  The Multi-streamed Solid-State Drive , 2014, HotStorage.

[16]  Youjip Won,et al.  FTL Design for TRIM Command ∗ , 2010 .

[17]  Philippe Bonnet,et al.  I/O Speculation for the Microsecond Era , 2014, USENIX Annual Technical Conference.

[18]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[19]  Philippe Bonnet,et al.  The Necessary Death of the Block Device Interface , 2013, CIDR.

[20]  S. Swanson,et al.  From ARIES to MARS : Reengineering Transaction Management for Next-Generation , Solid-State Drives , 2013 .

[21]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[22]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[24]  Myoungsoo Jung,et al.  Exploring Design Challenges in Getting Solid State Drives Closer to CPU , 2016, IEEE Transactions on Computers.

[25]  Tao Zou,et al.  Tango: distributed data structures over a shared log , 2013, SOSP.

[26]  Peter Desnoyers,et al.  Active Flash: Out-of-core data analytics on flash storage , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[27]  Jae-Myung Kim,et al.  A case for flash memory ssd in enterprise database applications , 2008, SIGMOD Conference.

[28]  Li-Pin Chang,et al.  Dual Greedy: Adaptive garbage collection for page-mapping solid-state disks , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  Panganamala Ramana Kumar,et al.  A cautionary perspective on cross-layer design , 2005, IEEE Wireless Communications.

[30]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.

[31]  John Shalf,et al.  Exploring the future of out-of-core computing with compute-local non-volatile memory , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[32]  Steven Swanson,et al.  The bleak future of NAND flash memory , 2012, FAST.

[33]  Jin Li,et al.  SkimpyStash: RAM space skimpy key-value store on flash-based storage , 2011, SIGMOD '11.

[34]  Bryan Veal,et al.  Towards SSD-Ready Enterprise Platforms , 2010, ADMS@VLDB.

[35]  Dhabaleswar K. Panda,et al.  SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks , 2012, 2012 41st International Conference on Parallel Processing.

[36]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[37]  Wei Shi,et al.  Möbius: A high performance transactional SSD with rich primitives , 2014, 2014 30th Symposium on Mass Storage Systems and Technologies (MSST).

[38]  Andrea C. Arpaci-Dusseau,et al.  Getting real: lessons in transitioning research simulations into hardware systems , 2013, FAST.

[39]  Lang long,et al.  On cross-layer design of wireless networks , 2004, Proceedings of the IEEE 6th Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication (IEEE Cat. No.04EX710).

[40]  Rajesh K. Gupta,et al.  Onyx: A Prototype Phase Change Memory Storage Array , 2011, HotStorage.

[41]  Suman Nath,et al.  Cheap and Large CAMs for High Performance Data-Intensive Networked Systems , 2010, NSDI.

[42]  Hyeonsang Eom,et al.  Exploiting Peak Device Throughput from Random Access Workload , 2012, HotStorage.

[43]  Dahlia Malkhi,et al.  CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.

[44]  Dhabaleswar K. Panda,et al.  Beyond block I/O: Rethinking traditional storage primitives , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[45]  Heon Young Yeom,et al.  Providing QoS through host controlled flash SSD garbage collection and multiple SSDs , 2015, 2015 International Conference on Big Data and Smart Computing (BIGCOMP).

[46]  David Hung-Chang Du,et al.  A Workload-Aware Adaptive Hybrid Flash Translation Layer with an Efficient Caching Strategy , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[47]  Gregory R. Ganger,et al.  Blurring the Line Between Oses and Storage Devices (CMU-CS-01-166) , 2001 .

[48]  M. van der Schaar,et al.  Cross-layer wireless multimedia transmission: challenges, principles, and new paradigms , 2005, IEEE Wireless Communications.

[49]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[50]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[51]  Junghee Lee,et al.  Harmonia: A globally coordinated garbage collector for arrays of Solid-State Drives , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[52]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[53]  Hyeonsang Eom,et al.  HIOPS-KV: Exploiting multiple flash solid-state drives for key value stores , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).

[54]  Andrea C. Arpaci-Dusseau,et al.  Split-level I/O scheduling , 2015, SOSP.

[55]  Lidong Zhou,et al.  Transactional Flash , 2008, OSDI.

[56]  Kai Shen,et al.  FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.

[57]  Mahmut T. Kandemir,et al.  Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[58]  Mahmut T. Kandemir,et al.  Physically addressed queueing (PAQ): Improving parallelism in solid state disks , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[59]  Kai Shen,et al.  FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs , 2013, USENIX Annual Technical Conference.

[60]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[61]  Sungjin Lee,et al.  Refactored Design of I/O Architecture for Flash Storage , 2015, IEEE Computer Architecture Letters.

[62]  Nisha Talagala,et al.  NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store , 2015, USENIX Annual Technical Conference.

[63]  Mahmut T. Kandemir,et al.  Sprinkler: Maximizing resource utilization in many-chip solid state disks , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[64]  Yong Wang,et al.  Active SSD design for energy-efficiency improvement of web-scale data analysis , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[65]  Nisha Talagala,et al.  Don't Stack Your Log On My Log , 2014, INFLOW.

[66]  Thomas R. Gross,et al.  Unified High-Performance I/O: One Stack to Rule Them All , 2013, HotOS.

[67]  David G. Andersen,et al.  Using vector interfaces to deliver millions of IOPS from a networked key-value storage server , 2012, SoCC '12.

[68]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[69]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[70]  Dahlia Malkhi,et al.  Beyond block I/O: implementing a distributed shared log in hardware , 2013, SYSTOR '13.

[71]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[72]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[73]  Bharadwaj Veeravalli,et al.  WAFTL: A workload adaptive flash translation layer with data partition , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[74]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[75]  Bharath Ramsundar,et al.  NVMKV: A Scalable and Lightweight Flash Aware Key-Value Store , 2014, HotStorage.

[76]  David Flynn,et al.  DFS: A file system for virtualized flash storage , 2010, TOS.

[77]  Steven Swanson,et al.  Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage , 2013, Computer.

[78]  Eunji Lee,et al.  Unioning of the buffer cache and journaling layers with non-volatile memory , 2013, FAST.

[79]  Carlos Maltzahn,et al.  Flash on Rails: Consistent Flash Performance through Redundancy , 2014, USENIX Annual Technical Conference.