NAE-SAT-based probabilistic membership filters

Probabilistic membership filters are a type of data structure designed to quickly verify whether an element of a large data set belongs to a subset of the data. While false negatives are not possible, false positives are. Therefore, the main goal of any good probabilistic membership filter is to have a small false-positive rate while being memory efficient and fast to query. Although Bloom filters are fast to construct, their memory efficiency is bounded by a strict theoretical upper bound. Weaver et al. introduced random satisfiability-based filters that significantly improved the efficiency of the probabilistic filters, however, at the cost of solving a complex random satisfiability (SAT) formula when constructing the filter. Here we present an improved SAT filter approach with a focus on reducing the filter building times, as well as query times. Our approach is based on using not-all-equal (NAE) SAT formulas to build the filters, solving these via a mapping to random SAT using traditionally-fast random SAT solvers, as well as bit packing and the reduction of the number of hash functions. Paired with fast hardware, NAE-SAT filters could result in enterprise-size applications.

[1]  Matthias Troyer,et al.  Feedback-optimized parallel tempering Monte Carlo , 2006, cond-mat/0602085.

[2]  C. Thompson The Statistical Mechanics of Phase Transitions , 1978 .

[3]  Krzysztof Apt,et al.  Principles of Constraint Programming: Constraint propagation algorithms , 2003 .

[4]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[5]  西森 秀稔 Statistical physics of spin glasses and information processing : an introduction , 2001 .

[6]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[7]  Andrew J. Mayer,et al.  Satisfiability-based Set Membership Filters , 2014, J. Satisf. Boolean Model. Comput..

[8]  K. Hukushima,et al.  Exchange Monte Carlo Method and Application to Spin Glass Simulations , 1995, cond-mat/9512035.

[9]  Konstantinos Panagiotou,et al.  Going after the k-SAT threshold , 2013, STOC '13.

[10]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[11]  Zheng Zhu,et al.  borealis—A generalized global update algorithm for Boolean optimization problems , 2016, Optimization Letters.

[12]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[13]  Abdul Sattar,et al.  NuMVC: An Efficient Local Search Algorithm for Minimum Vertex Cover , 2014, J. Artif. Intell. Res..

[14]  Toby Walsh,et al.  Handbook of satisfiability , 2009 .

[15]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.