Crail: A High-Performance I/O Architecture for Distributed Data Processing

Effectively leveraging fast networking and storage hardware for distributed data processing remains challenging. Often the hardware integration takes place too low in the stack, and as a result performance advantages are overshadowed by higher layer software overheads. Moreover, new opportunities for fundamental architectural changes within the data processing layer are not being explored. Crail is a user-level I/O architecture for the Apache data processing ecosystem, designed from the ground up for high-performance networking and storage hardware. With Crail, hardware performance advantages become visible at the application level and translate into workload runtime improvements. In this paper, we discuss the basic concepts of Crail and show how Crail impacts workloads in Spark, like sorting or SQL.

[1]  Christoforos E. Kozyrakis,et al.  Flash storage disaggregation , 2016, EuroSys.

[2]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[3]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[4]  Dhabaleswar K. Panda,et al.  High Performance Design for HDFS with Byte-Addressability of NVM and RDMA , 2016, ICS.

[5]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[6]  Dhabaleswar K. Panda,et al.  In-memory I/O and replication for HDFS with Memcached: Early experiences , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[7]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[8]  Carsten Binnig,et al.  The End of Slow Networks: It's Time for a Redesign , 2015, Proc. VLDB Endow..

[9]  Nikolas Ioannou,et al.  On The [Ir]relevance of Network Performance for Data Processing , 2016, HotCloud.

[10]  Nisha Talagala,et al.  NVMKV: A Scalable, Lightweight, FTL-aware Key-Value Store , 2015, USENIX Annual Technical Conference.

[11]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[12]  Gustavo Alonso,et al.  Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.

[13]  Jin Li,et al.  FlashStore , 2010, Proc. VLDB Endow..

[14]  Christoforos E. Kozyrakis,et al.  ReFlex: Remote Flash ≈ Local Flash , 2017, ASPLOS.

[15]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[16]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[17]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[18]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[19]  Steven Swanson,et al.  Providing safe, user space access to fast, solid state disks , 2012, ASPLOS XVII.

[20]  Alfons Kemper,et al.  Flow-Join: Adaptive skew handling for distributed joins over high-speed networks , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[21]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[22]  Dhabaleswar K. Panda,et al.  A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS , 2015, BPOE.

[23]  Animesh Trivedi,et al.  jVerbs: ultra-low latency for data center applications , 2013, SoCC.

[24]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.