A Bloom Filter-Based Index for Distributed Storage Systems

The indexing technique, which is capable of locating an item, is a key component in distributed storage systems. There have been many solutions for the index in distributed systems. One of the problems is the large number of items and the (relatively) low space available for the index. In this paper we propose a bloom filter based schema for the representation and lookup of items in the distributed systems. In each node, the method selects items and inserts them into a probabilistic data structure. After gathering all the data structures, the index node is in possess of all objects information and is capable of locating items in the system. To reduce the false checking times of the index, we choose items to be recorded in reference with the Internet user behavior pattern. We further use theoretical and experimental analysis to test our proposal. Results show that our method can achieve high performance with limited index space.

[1]  Hiroaki Kobayashi,et al.  Modeling of cache access behavior based on Zipf's law , 2008, MEDEA '08.

[2]  Stefano Giordano,et al.  MultiLayer Compressed Counting Bloom Filters , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[3]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[4]  Ravi Kumar,et al.  Compressed web indexes , 2009, WWW '09.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Hong Jiang,et al.  HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[7]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM 2006.

[8]  Tiejian Luo,et al.  Intelligent video content routing in a direct access network , 2011, 2011 3rd Symposium on Web Society.

[9]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  James K. Mullin,et al.  A second look at bloom filters , 1983, CACM.

[11]  Pekka Nikander,et al.  LIPSIN: line speed publish/subscribe inter-networking , 2009, SIGCOMM '09.

[12]  Zhu Wang,et al.  LBA privacy preserving index and its theoretical analysis in cloud storage systems , 2014, SFCS '14.

[13]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[14]  Zhu Wang,et al.  A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems , 2014, EDBT/ICDT Workshops.