Hmfs: Efficient Support of Small Files Processing over HDFS

The storage and access of massive small files are one of the challenges in the design of distributed file system. Hadoop distributed file system (HDFS) is primarily designed for reliable storage and fast access of very big files while it suffers a performance penalty with increasing number of small files. A middleware called Hmfs is proposed in this paper to improve the efficiency of storing and accessing small files on HDFS. It is made up of three layers, file operation interfaces to make it easier for software developers to submit different file requests, file management tasks to merge small files into big ones or extract small files from big ones in the background, and file buffers to improve the I/O performance. Hmfs boosts the file upload speed by using asynchronous write mechanism and the file download speed by adopting prefetching and caching strategy. The experimental results show that Hmfs can help to obtain high speed of storage and access for massive small files on HDFS.

[1]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[2]  Hui Liu,et al.  The Design of Distributed File System Based on HDFS , 2013 .

[3]  B. Prabavathy,et al.  A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System , 2013, 2013 International Conference on Computer Communication and Informatics.

[4]  Chao Li,et al.  A Packaging Approach for Massive Amounts of Small Geospatial Files with HDFS , 2012, WAIM.

[5]  Qinghua Zheng,et al.  An optimized approach for storing and accessing small files on cloud storage , 2012, J. Netw. Comput. Appl..

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Xubin He,et al.  Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[8]  Parth Gohil,et al.  Efficient Ways to Improve the Performance of HDFS for Small Files , 2014 .

[9]  Qinghua Zheng,et al.  A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files , 2010, 2010 IEEE International Conference on Services Computing.

[10]  Yingchi Mao,et al.  Storage and Accessing Small Files Based on HDFS , 2014 .