Enhanced dual Bloom filter based on SSD for efficient directory parsing in cloud storage system

In a file system used for big data analytics, hundreds of thousands of files exist. In such huge storage system, getting the metadata of a file takes long time. In this paper we propose an enhanced Bloom filter to accelerate the directory parsing process in large-scale file systems. Here a cache implemented on SSD keeps the metadata of directories and files accessed frequently or recently. When a file is requested, the system attempts to get the metadata from the SSD. If the metadata is not found, the access to the SSD becomes a waste of time. To avoid unnecessary SSD accesses, the flag-augmented Bloom filter (FABF) is proposed with which the existence of metadata of the requested file in the cache is predicted. Analytical modeling demonstrates that the false positive rate and false negative rate are reduced compared to the existing scheme. In addition, the implementation overhead of the proposed scheme is small.

[1]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[2]  M. V. Ramakrishna,et al.  A Performance Study of Hashing Functions for Hardware Applications , 1994 .

[3]  GhemawatSanjay,et al.  The Google file system , 2003 .

[4]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[5]  Farzaneh Sadat Tabataba,et al.  Improving false positive in Bloom filter , 2011, 2011 19th Iranian Conference on Electrical Engineering.

[6]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[7]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[8]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[9]  Nong Xiao,et al.  An SSD-based accelerator for directory parsing in storage systems containing massive files , 2013, Peer Peer Netw. Appl..

[10]  Jongmoo Choi,et al.  VSSIM: Virtual machine based SSD simulator , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[11]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.