iFlatLFS: Performance optimization for accessing massive small files

The processing of massive small files is a challenge in the design of distributed file systems. Currently, the combined-block-storage approach is prevalent. However, the approach employs traditional file systems like ExtFS and may cause inefficiency for random access to small files. This paper focuses on optimizing the performance of data servers in accessing massive small files. We present a Flat Lightweight File System (iFlatLFS) to manage small files, which is based on a simple metadata scheme and a flat storage architecture. iFlatLFS aims to substitute the traditional file system on data servers that are mainly used to store small files, and it can greatly simplify the original data access procedure. The new metadata proposed in this paper occupies only a fraction of the original metadata size based on traditional file systems. We have implemented iFlatLFS in CentOS 5.5 and integrated it into an open source Distributed File System (DFS), called Taobao FileSystem (TFS), which is developed by a top B2C service provider, Alibaba, in China and is managing over 28.6 billion small photos. We have conducted extensive experiments to verify the performance of iFlatLFS. The results show that when the file size ranges from 1KB to 64KB, iFlatLFS is faster than Ext4 by 48% and 54% on average for random read and write in the DFS environment, respectively. Moreover, after iFlatLFS is integrated into TFS, iFlatLFS-based TFS is faster than the existing Ext4-based TFS by 45% and 49% on average for random read access and hybrid access (the mix of read and write accesses), respectively.

[1]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[2]  Sanjeev Kumar,et al.  Finding a Needle in Haystack: Facebook's Photo Storage , 2010, OSDI.

[3]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  J. Famaey,et al.  Content Delivery Networks , 2012 .

[5]  Steve Pate,et al.  UNIX Filesystems: Evolution, Design and Impemenation , 2003 .

[6]  Wei Hu,et al.  Scalability in the XFS File System , 1996, USENIX Annual Technical Conference.

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[9]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[10]  GhemawatSanjay,et al.  The Google file system , 2003 .

[11]  Meina Song,et al.  THE optimization of HDFS based on small files , 2010, 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT).

[12]  Valentina Timcenko,et al.  Ext4 file system performance analysis in linux environment , 2011 .

[13]  M. Frans Kaashoek,et al.  Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files , 1997, USENIX Annual Technical Conference.