Improving retouched Bloom filter for trading off selected false positives against false negatives

Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives. This paper is about the retouched Bloom filter (RBF). An RBF is an extension that makes the Bloom filter more flexible by permitting the removal of false positives, at the expense of introducing false negatives, and that allows a controlled trade-off between the two. We analytically show that creating RBFs through a random process decreases the false positive rate in the same proportion as the false negative rate that is generated. We further provide some simple heuristics that decrease the false positive rate more than the corresponding increase in the false negative rate, when creating RBFs. These heuristics are more effective than the ones we have presented in prior work. We further demonstrate the advantages of an RBF over a Bloom filter in a distributed network topology measurement application. We finally discuss several networking applications that could benefit from RBFs instead of standard Bloom filters.

[1]  Arun Venkataramani,et al.  iPlane: an information plane for distributed services , 2006, OSDI '06.

[2]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[3]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[4]  Lee L. Gremillion Designing a Bloom filter for differential file access , 1982, CACM.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  James K. Mullin,et al.  A second look at bloom filters , 1983, CACM.

[7]  John Kubiatowicz,et al.  Probabilistic location and routing , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[8]  Stefano Giordano,et al.  MultiLayer Compressed Counting Bloom Filters , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[9]  James K. Mullin,et al.  A tale of three spelling checkers , 1990, Softw. Pract. Exp..

[10]  Michael Mitzenmacher,et al.  Digital Fountains and Their Application to Informed Content Delivery over Adaptive Overlay Networks , 2005, DISC.

[11]  Timur Friedman,et al.  TopHat: Supporting Experiments through Measurement Infrastructure Federation , 2010, TRIDENTCOM.

[12]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[13]  A. Maheshwari,et al.  Bloom Filters , 2006 .

[14]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[15]  Mark Crovella,et al.  Deployment of an Algorithm for Large-Scale Topology Discovery , 2006, IEEE Journal on Selected Areas in Communications.

[16]  Richard P. Martin,et al.  PlanetP: using gossiping to build content addressable peer-to-peer information sharing communities , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[17]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[18]  Dmitri Krioukov,et al.  Internet Mapping: From Art to Science , 2009, 2009 Cybersecurity Applications & Technology Conference for Homeland Security.

[19]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[20]  Igor M. Moraes,et al.  A New IP Traceback System Against Distributed Denial-of-Service Attacks , 2005 .

[21]  Yuval Shavitt,et al.  DIMES: let the internet measure itself , 2005, CCRV.

[22]  Mark Santcroos,et al.  Providing Active Measurements as a Regular Service for ISP's , 2001 .

[23]  Alessandro Vespignani,et al.  A statistical approach to the traceroute-like exploration of networks: theory and simulations , 2004, ArXiv.

[24]  Michael Mitzenmacher,et al.  Distance-Sensitive Bloom Filters , 2006, ALENEX.

[25]  Udi Manber,et al.  An Algorithm for Approximate Membership checking with Application to Password Security , 1994, Inf. Process. Lett..

[26]  Jeffrey Considine,et al.  Informed content delivery across adaptive overlay networks , 2002, IEEE/ACM Transactions on Networking.

[27]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[28]  K. Claffy,et al.  Topology discovery by active probing , 2002, Proceedings 2002 Symposium on Applications and the Internet (SAINT) Workshops.

[29]  Benoit Donnet,et al.  Internet topology discovery: a survey , 2007, IEEE Communications Surveys & Tutorials.

[30]  M. Douglas,et al.  Development of a Spelling List , 1982 .

[31]  George Varghese,et al.  Beyond bloom filters: from approximate membership checks to approximate state machines , 2006, SIGCOMM.

[32]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[33]  DonnetBenoit,et al.  Improving retouched Bloom filter for trading off selected false positives against false negatives , 2010 .

[34]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[35]  Fabián E. Bustamante,et al.  Taming the torrent: a practical approach to reducing cross-isp traffic in peer-to-peer systems , 2008, SIGCOMM '08.

[36]  Balaji Prabhakar,et al.  Bloom filters: Design innovations and novel applications , 2005 .