SuRF: Practical Range Query Filtering with Fast Succinct Tries

We present the Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5× and 5× with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate.

[1]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[2]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[3]  Ion Stoica,et al.  BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores , 2016, NSDI.

[4]  Nieves R. Brisaboa,et al.  Practical compressed string dictionaries , 2016, Inf. Syst..

[5]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[6]  S. Srinivasa Rao,et al.  Succinct Representations of Functions , 2004, ICALP.

[7]  Lin Ma,et al.  Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes , 2016, SIGMOD Conference.

[8]  George Varghese,et al.  An Improved Construction for Counting Bloom Filters , 2006, ESA.

[9]  Ion Stoica,et al.  Succinct: Enabling Queries on Compressed Data , 2015, NSDI.

[10]  Minlan Yu,et al.  BUFFALO: bloom filter forwarding architecture for large organizations , 2009, CoNEXT '09.

[11]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[12]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[13]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[14]  Rajeev Raman,et al.  Representing Trees of Higher Degree , 2005, Algorithmica.

[15]  Donald Kossmann,et al.  Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia , 2013, Proc. VLDB Endow..

[16]  Dong Zhou,et al.  Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences , 2013, SEA.

[17]  Manos Athanassoulis,et al.  Monkey: Optimal Navigable Key-Value Store , 2017, SIGMOD Conference.

[18]  Eric Wang,et al.  LittleTable: A Time-Series Database and Its Uses , 2017, SIGMOD Conference.

[19]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[20]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[21]  S. Srinivasa Rao,et al.  Space Efficient Suffix Trees , 1998, J. Algorithms.

[22]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[23]  Gonzalo Navarro,et al.  Succinct Trees in Practice , 2010, ALENEX.

[24]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[25]  J. Ian Munro,et al.  Succinct Representation of Balanced Parentheses and Static Trees , 2002, SIAM J. Comput..

[26]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[27]  Gonzalo Navarro,et al.  Fast, Small, Simple Rank/Select on Bitmaps , 2012, SEA.

[28]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[29]  Naila Rahman,et al.  Engineering the LOUDS Succinct Tree Representation , 2006, WEA.

[30]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[31]  Giuseppe Ottaviano,et al.  Design of Practical Succinct Data Structures for Large Data Collections , 2013, SEA.

[32]  Giuseppe Ottaviano,et al.  Fast Compressed Tries through Path Decompositions , 2011, ALENEX.

[33]  Gonzalo Navarro,et al.  Fully-functional succinct trees , 2010, SODA '10.

[34]  Hsueh-I Lu,et al.  Balanced parentheses strike back , 2008, TALG.

[35]  Naila Rahman,et al.  A simple optimal representation for balanced parentheses , 2004, Theor. Comput. Sci..

[36]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[37]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[38]  J. Ian Munro Succinct Data Structures , 2004, Electron. Notes Theor. Comput. Sci..

[39]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[40]  Idit Keidar,et al.  Scaling concurrent log-structured data stores , 2015, EuroSys.

[41]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[42]  Tony Savor,et al.  Optimizing Space Amplification in RocksDB , 2017, CIDR.