BTRFS: The Linux B-Tree Filesystem

BTRFS is a Linux filesystem that has been adopted as the default filesystem in some popular versions of Linux. It is based on copy-on-write, allowing for efficient snapshots and clones. It uses B-trees as its main on-disk data structure. The design goal is to work well for many use cases and workloads. To this end, much effort has been directed to maintaining even performance as the filesystem ages, rather than trying to support a particular narrow benchmark use-case. Linux filesystems are installed on smartphones as well as enterprise servers. This entails challenges on many different fronts. ---Scalability. The filesystem must scale in many dimensions: disk space, memory, and CPUs. ---Data integrity. Losing data is not an option, and much effort is expended to safeguard the content. This includes checksums, metadata duplication, and RAID support built into the filesystem. ---Disk diversity. The system should work well with SSDs and hard disks. It is also expected to be able to use an array of different sized disks, which poses challenges to the RAID and striping mechanisms. This article describes the core ideas, data structures, and algorithms of this filesystem. It sheds light on the challenges posed by defragmentation in the presence of snapshots, and the tradeoffs required to maintain even performance in the face of a wide spectrum of workloads.

[1]  David Robinson,et al.  NFS version 4 Protocol , 2000, RFC.

[2]  Valerie Henson,et al.  Automatic Performance Tuning in the Zettabyte File System , 2003 .

[3]  Margo I. Seltzer,et al.  Tracking Back References in a Write-Anywhere File System , 2010, FAST.

[4]  L. Vivier,et al.  The new ext 4 filesystem : current status and future plans , 2007 .

[5]  Brent Callaghan,et al.  NFS Version 3 Protocol Specification , 1995, RFC.

[6]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[7]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[8]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[9]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[10]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[11]  Christoph Hellwig XFS: The Big Storage File System for Linux , 2009, login Usenix Mag..

[12]  Ohad Rodeh IBM Research Report Deferred Reference Counters for Copy-On-Write B-trees , 2010 .

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[15]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[16]  Erez Zadok,et al.  Building workload-independent storage with VT-trees , 2013, FAST.

[17]  Garth A. Gibson,et al.  TABLEFS: Embedding a NoSQL database inside the local file system , 2012, 2012 Digest APMRC.

[18]  Arkady Kanevsky,et al.  FlexVol: Flexible, Efficient File Volume Virtualization in WAFL , 2008, USENIX Annual Technical Conference.

[19]  OHAD RODEH,et al.  B-trees, shadowing, and clones , 2008, TOS.