How to Teach an Old File System Dog New Object Store Tricks

Many data service platforms use local file systems as their backend storage. Although this approach offers advantages in portability, extensibility, and ease of development, it may suffer from severe performance degradation if the mapping between the services required by the data service platform and the functions provided by the local file system is not carefully managed. This paper presents in-depth analysis of performance problems in current data service platforms that use file systems as their backend storage and proposes three novel strategies that are essential to solving the current performance problems. We demonstrate the efficacy of our strategies by implementing a prototype object store in Ceph, called SwimStore (Shadowing with Immutable Metadata Store). We experimentally show that SwimStore provides high performance with little variation, as well as a large reduction in write traffic.

[1]  Andrea C. Arpaci-Dusseau,et al.  WiscKey: Separating Keys from Values in SSD-conscious Storage , 2016, FAST.

[2]  Stephen C. Tweedie,et al.  Journaling the Linux ext2fs Filesystem , 2008 .

[3]  Erez Zadok,et al.  Enabling Transactional File Access via Lightweight Kernel Extensions , 2009, FAST.

[4]  Erez Zadok,et al.  Extending ACID semantics to the file system , 2007, TOS.

[5]  Sangyeun Cho,et al.  Behaviors of Storage Backends in Ceph Object Store , 2017 .

[6]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[7]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[10]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[11]  L. Vivier,et al.  The new ext 4 filesystem : current status and future plans , 2007 .

[12]  Terence Kelly,et al.  Failure-Atomic Updates of Application Data in a Linux File System , 2015, FAST.

[13]  GhemawatSanjay,et al.  The Google file system , 2003 .

[14]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[15]  M. Factor,et al.  Object storage: the future building block for storage systems , 2005, 2005 IEEE International Symposium on Mass Storage Systems and Technology.

[16]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[17]  Ittai Abraham,et al.  PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees , 2017, SOSP.

[18]  Andrea C. Arpaci-Dusseau,et al.  Physical Disentanglement in a Container-Based File System , 2014, OSDI.

[19]  William Pugh,et al.  A skip list cookbook , 1990 .

[20]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[21]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[22]  Andrea C. Arpaci-Dusseau,et al.  IRON file systems , 2005, SOSP '05.