ReFlex: Remote Flash ≈ Local Flash

Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity and IOPS within a datacenter. However, existing systems for remote Flash access either introduce significant performance overheads or fail to isolate the multiple remote clients sharing each Flash device. We present ReFlex, a software-based system for remote Flash access, that provides nearly identical performance to accessing local Flash. ReFlex uses a dataplane kernel to closely integrate networking and storage processing to achieve low latency and high throughput at low resource requirements. Specifically, ReFlex can serve up to 850K IOPS per core over TCP/IP networking, while adding 21us over direct access to local Flash. ReFlex uses a QoS scheduler that can enforce tail latency and throughput service-level objectives (SLOs) for thousands of remote clients. We show that ReFlex allows applications to use remote Flash while maintaining their original performance with local Flash.

[1]  Banu Özden,et al.  Disk scheduling with quality of service guarantees , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  Scott Shenker,et al.  Disk-Locality in Datacenter Computing Considered Irrelevant , 2011, HotOS.

[3]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[4]  Mor Harchol-Balter,et al.  PriorityMeister: Tail Latency QoS for Shared Networked Storage , 2014, SoCC.

[5]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX Annual Technical Conference.

[6]  Kai Shen,et al.  FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs , 2013, USENIX Annual Technical Conference.

[7]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[8]  Amin Vahdat,et al.  Chronos: predictable low latency for data center applications , 2012, SoCC '12.

[9]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[10]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[11]  Scott A. Brandt,et al.  The Design and Implementation of AQuA: An Adaptive Quality of Service Aware Object-Based Storage Device , 2006 .

[12]  Hemal Shah,et al.  A study of iSCSI extensions for RDMA (iSER) , 2003, NICELI '03.

[13]  Peter J. Varman,et al.  mClock: Handling Throughput Variability for Hypervisor IO Scheduling , 2010, OSDI.

[14]  Kang G. Shin,et al.  Maestro: quality-of-service in large disk arrays , 2011, ICAC '11.

[15]  Peter J. Varman,et al.  Efficient and adaptive proportional share I/O scheduling , 2009, PERV.

[16]  Christoforos E. Kozyrakis,et al.  Flash storage disaggregation , 2016, EuroSys.

[17]  Mark Handley,et al.  Network stack specialization for performance , 2013, HotNets.

[18]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition , 2013, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition.

[19]  Christoforos E. Kozyrakis,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 335 Dune: Safe User-level Access to Privileged Cpu Features , 2022 .

[20]  Abel Gordon,et al.  Paravirtual Remote I/O , 2016, ASPLOS.

[21]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[22]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[23]  Richard A. Golding,et al.  Zygaria: Storage Performance as a Managed Resource , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[24]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[25]  GhemawatSanjay,et al.  The Google file system , 2003 .

[26]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[27]  Andrea C. Arpaci-Dusseau,et al.  De-indirection for flash-based SSDs with nameless writes , 2012, FAST.

[28]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[29]  Hitesh Ballani,et al.  End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.

[30]  Randy H. Katz,et al.  Cake: enabling high-level SLOs on shared storage systems , 2012, SoCC '12.

[31]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[32]  Peter Desnoyers,et al.  Write Endurance in Flash Drives: Measurements and Analysis , 2010, FAST.

[33]  Fabio Checconi,et al.  High Throughput Disk Scheduling with Fair Bandwidth Distribution , 2010, IEEE Transactions on Computers.

[34]  Christoforos E. Kozyrakis,et al.  Energy proportionality and workload consolidation for latency-critical applications , 2015, SoCC.

[35]  Bianca Schroeder,et al.  sRoute: Treating the Storage Stack Like a Network , 2016, FAST.

[36]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[37]  Tzi-cker Chiueh,et al.  Secure I/O device sharing among virtual machines on multiple hosts , 2013, ISCA.

[38]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[39]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[40]  Steven Swanson,et al.  QuickSAN: a storage area network for fast, distributed, solid state disks , 2013, ISCA.

[41]  Wei Jin,et al.  Interposed proportional sharing for a storage service utility , 2004, SIGMETRICS '04/Performance '04.

[42]  Antony I. T. Rowstron,et al.  IOFlow: a software-defined storage architecture , 2013, SOSP.

[43]  Anand Sivasubramaniam,et al.  Storage performance virtualization via throughput and latency control , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[44]  Philippe Bonnet,et al.  Linux block IO: introducing multi-queue SSD access on multi-core systems , 2013, SYSTOR '13.

[45]  Yong Wang,et al.  SDF: software-defined flash for web-scale internet storage systems , 2014, ASPLOS.

[46]  Prashant J. Shenoy,et al.  Cello: A Disk Scheduling Framework for Next Generation Operating Systems* , 1998, SIGMETRICS '98/PERFORMANCE '98.

[47]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[48]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.

[49]  Abhijeet Joglekar,et al.  A scalable and high performance software iSCSI implementation , 2005, FAST'05.

[50]  Kai Shen,et al.  FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.

[51]  Julian Satran,et al.  Internet Small Computer Systems Interface (iSCSI) , 2004, RFC.

[52]  Michael J. Freedman,et al.  From application requests to virtual IOPs: provisioned key-value storage with Libra , 2014, EuroSys '14.

[53]  Peter J. Varman,et al.  pClock: an arrival curve based approach for QoS guarantees in shared storage systems , 2007, SIGMETRICS '07.

[54]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[55]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks: the single-node case , 1993, TNET.

[56]  Gregory R. Ganger,et al.  Argon: Performance Insulation for Shared Storage Servers , 2007, FAST.