Efficiently evaluating complex boolean expressions

The problem of efficiently evaluating a large collection of complex Boolean expressions - beyond simple conjunctions and Disjunctive/Conjunctive Normal Forms (DNF/CNF) - occurs in many emerging online advertising applications such as advertising exchanges and automatic targeting. The simple solution of normalizing complex Boolean expressions to DNF or CNF form, and then using existing methods for evaluating such expressions is not always effective because of the exponential blow-up in the size of expressions due to normalization. We thus propose a novel method for evaluating complex expressions, which leverages existing techniques for evaluating leaf-level conjunctions, and then uses a bottom-up evaluation technique to only process the relevant parts of the complex expressions that contain the matching conjunctions. We develop two such bottom-up evaluation techniques, one based on Dewey IDs and another based on mapping Boolean expressions to one-dimensional intervals. Our experimental evaluation based on data obtained from an online advertising exchange shows that the proposed techniques are efficient and scalable, both with respect to space usage as well as evaluation time.

[1]  Lan Huang,et al.  Scalable trigger processing , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Dennis Shasha,et al.  Filtering algorithms and implementation for very fast publish/subscribe systems , 2001, SIGMOD '01.

[3]  Dennis Shasha,et al.  Efficient Matching for Web-Based Publish/Subscribe Systems , 2000, CoopIS.

[4]  Guido Moerkotte,et al.  Optimizing Boolean Expressions in Object-Bases , 1992, VLDB.

[5]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[6]  Dieter Gawlick,et al.  Managing Expressions as Data in Relational Database Systems , 2003, CIDR.

[7]  Junghoo Cho,et al.  A fast regular expression indexing engine , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[9]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[10]  Donald K. Burleson,et al.  Oracle Database 10g New Features: Oracle10g Reference for Advanced Tuning and Administration , 2003 .

[11]  Hector Garcia-Molina,et al.  Index structures for selective dissemination of information under the Boolean model , 1994, TODS.

[12]  Sergei Vassilvitskii,et al.  Indexing Boolean Expressions , 2009, Proc. VLDB Endow..

[13]  Hans-Arno Jacobsen,et al.  Predicate matching and subscription matching in Publish/Subscribe systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems Workshops.

[14]  Sven Bittner,et al.  The arbitrary Boolean publish/subscribe model: making the case , 2007, DEBS '07.

[15]  Alexander L. Wolf,et al.  Forwarding in a content-based network , 2003, SIGCOMM '03.

[16]  Kenneth A. Ross,et al.  Selection conditions in main memory , 2004, TODS.

[17]  Hector Garcia-Molina,et al.  The SIFT information dissemination system , 1999, TODS.

[18]  Rajeev Rastogi,et al.  RE-Tree: An Efficient Index Structure for Regular Expressions , 2002, VLDB.