LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism

In multi-tenant systems, the CPU overhead of distributed file systems (DFSes) is increasingly a burden to application performance. CPU and memory interference cause degraded and unstable application and storage performance, in particular for operation latency. Recent client-local DFSes for persistent memory (PM) accelerate this trend. DFS offload to SmartNICs is a promising solution to these problems, but it is challenging to fit the complex demands of a DFS onto simple SmartNIC processors located across PCIe. We present LineFS, a SmartNIC-offloaded, high-performance DFS with support for client-local PM. To fully leverage the SmartNIC architecture, we decompose DFS operations into execution stages that can be offloaded to a parallel datapath execution pipeline on the SmartNIC. LineFS offloads CPU-intensive DFS tasks, like replication, compression, data publication, index and consistency management to a Smart-NIC. We implement LineFS on the Mellanox BlueField Smart-NIC and compare it to Assise, a state-of-the-art PM DFS. LineFS improves latency in LevelDB up to 80% and throughput in Filebench up to 79%, while providing extended DFS availability during host system failures.

[1]  Alex C. Snoeren,et al.  SmartNIC Performance Isolation with FairNIC: Programmable Networking for the Cloud , 2020, SIGCOMM.

[2]  Sameh Elnikety,et al.  PerfIso: Performance Isolation for Commercial Latency-Sensitive Services , 2018, USENIX Annual Technical Conference.

[3]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[4]  Andrea C. Arpaci-Dusseau,et al.  Optimistic crash consistency , 2013, SOSP.

[5]  Srinivasan Seshan,et al.  Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems , 2018, SIGCOMM.

[6]  David Walker,et al.  Enabling Programmable Transport Protocols in High-Speed NICs , 2020, NSDI.

[7]  Pandian Raju,et al.  Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing , 2018, OSDI.

[8]  Youngjin Kwon,et al.  Rethinking File Mapping for Persistent Memory , 2021, FAST.

[9]  Karan Gupta,et al.  Offloading distributed applications onto smartNICs using iPipe , 2019, SIGCOMM.

[10]  Kimberly Keeton,et al.  LazyBase: trading freshness for performance in a scalable database , 2012, EuroSys '12.

[11]  Geoffrey M. Voelker,et al.  Dark packets and the end of network scaling , 2018, ANCS.

[12]  Rastislav Bodík,et al.  Floem: A Programming System for NIC-Accelerated Network Applications , 2018, OSDI.

[13]  Shin-Yeh Tsai StreamBox : Modern Stream Processing on a Multicore Machine , 2017 .

[14]  Marco Canini,et al.  Assise: Performance and Availability via Client-local NVM in a Distributed File System , 2020, OSDI.

[15]  Yang Wang,et al.  Robustness in the Salus Scalable Block Store , 2013, NSDI.

[16]  Yongqiang Xiong,et al.  ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware , 2016, SIGCOMM.

[17]  Hari Balakrishnan,et al.  Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.

[18]  Erez Zadok,et al.  Filebench: A Flexible Framework for File System Benchmarking , 2016, login Usenix Mag..

[19]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[20]  Arvind Krishnamurthy,et al.  E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers , 2019, USENIX ATC.

[21]  Steven Swanson,et al.  This paper is included in the Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’20) , 2022 .

[22]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[23]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[24]  Robert Birke,et al.  Failure Analysis of Virtual and Physical Machines: Patterns, Causes and Characteristics , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[25]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[26]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[27]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28]  Shin-Yeh Tsai,et al.  Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores , 2020, USENIX ATC.

[29]  Youyou Lu,et al.  Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing , 2019, EuroSys.

[30]  David Sidler,et al.  StRoM: smart remote memory , 2020, EuroSys.

[31]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[32]  Balaji Prabhakar,et al.  λ-NIC: Interactive Serverless Compute on Programmable SmartNICs , 2019, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS).

[33]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[34]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[35]  Giuseppe Bianchi,et al.  hXDP , 2020, OSDI.

[36]  Joshua Fried,et al.  Caladan: Mitigating Interference at Microsecond Timescales , 2020, OSDI.

[37]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[38]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[39]  Aditya Akella,et al.  PANIC: A High-Performance Programmable NIC for Multi-tenant Networks , 2020, OSDI.

[40]  GhemawatSanjay,et al.  The Google file system , 2003 .