CFS: A Distributed File System for Large Scale Container Platforms

We propose CFS, a distributed file system for large scale container platforms. CFS supports both sequential and random file accesses with optimized storage for both large files and small files, and adopts different replication protocols for different write scenarios to improve the replication performance. It employs a metadata subsystem to store and distribute the file metadata across different storage nodes based on the memory usage. This metadata placement strategy avoids the need of data rebalancing during capacity expansion. CFS also provides POSIX-compliant APIs with relaxed semantics and metadata atomicity to improve the system performance. We performed a comprehensive comparison with Ceph, a widely-used distributed file system on container platforms. Our experimental results show that, in testing 7 commonly used metadata operations, CFS gives around 3 times performance boost on average. In addition, CFS exhibits better random-read/write performance in highly concurrent environments with multiple clients and processes.

[1]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[2]  Scott A. Brandt,et al.  Efficient metadata management in large distributed storage systems , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[3]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4]  Carlos Maltzahn,et al.  RADOS: a scalable, reliable storage service for petabyte-scale storage clusters , 2007, PDSW '07.

[5]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[6]  Meng Zhu,et al.  Journaling of journal is (almost) free , 2014, FAST.

[7]  Pooyan Jamshidi,et al.  Microservices Architecture Enables DevOps: Migration to a Cloud-Native Architecture , 2016, IEEE Software.

[8]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[9]  Claus Pahl,et al.  Containerization and the PaaS Cloud , 2015, IEEE Cloud Computing.

[10]  Moshe Bar Linux File Systems , 2001 .

[11]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[12]  Eric A. Brewer,et al.  Kubernetes and the path to cloud native , 2015, SoCC.

[13]  Rodney Van Meter,et al.  Network attached storage architecture , 2000, CACM.

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  David Bernstein,et al.  Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[16]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.

[17]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[18]  Herodotos Herodotou,et al.  OctopusFS: A Distributed File System with Tiered Storage Management , 2017, SIGMOD Conference.

[19]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[20]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[21]  GhemawatSanjay,et al.  The Google file system , 2003 .

[22]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[23]  Mahadev Satyanarayanan,et al.  Andrew: a distributed personal computing environment , 1986, CACM.

[24]  Wei Cao,et al.  PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database , 2018, Proc. VLDB Endow..

[25]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[26]  Kanad Ghose,et al.  hFS: a hybrid file system prototype for improving small file and metadata performance , 2007, EuroSys '07.

[27]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX Annual Technical Conference.