BF-Tree: Approximate Tree Indexing

The increasing volume of time-based generated data and the shift in storage technologies suggest that we might need to reconsider indexing. Several workloads - like social and service monitoring - often include attributes with implicit clustering because of their time-dependent nature. In addition, solid state disks (SSD) (using flash or other low-level technologies) emerge as viable competitors of hard disk drives (HDD). Capacity and access times of storage devices create a trade-off between SSD and HDD. Slow random accesses in HDD have been replaced by efficient random accesses in SSD, but their available capacity is one or more orders of magnitude more expensive than the one of HDD. Indexing, however, is designed assuming HDD as secondary storage, thus minimizing random accesses at the expense of capacity. Indexing data using SSD as secondary storage requires treating capacity as a scarce resource. To this end, we introduce approximate tree indexing, which employs probabilistic data structures (Bloom filters) to trade accuracy for size and produce smaller, yet powerful, tree indexes, which we name Bloom filter trees (BF-Trees). BF-Trees exploit pre-existing data ordering or partitioning to offer competitive search performance. We demonstrate, both by an analytical study and by experimental results, that by using workload knowledge and reducing indexing accuracy up to some extent, we can save substantially on capacity when indexing on ordered or partitioned attributes. In particular, in experiments with a synthetic workload, approximate indexing offers 2.22x-48x smaller index footprint with competitive response times, and in experiments with TPCH and a monitoring real-life dataset from an energy company, it offers 1.6x-4x smaller index footprint with competitive search times as well.

[1]  Goetz Graefe,et al.  Query processing techniques for solid state drives , 2009, SIGMOD Conference.

[2]  Ryan Johnson,et al.  Evaluating and repairing write performance on flash devices , 2009, DaMoN '09.

[3]  Joseph M. Hellerstein,et al.  Online aggregation and continuous query support in MapReduce , 2010, SIGMOD Conference.

[4]  Rudolf Bayer,et al.  Prefix B-trees , 1977, TODS.

[5]  Alon Itai,et al.  Interpolation search—a log logN search , 1978, CACM.

[6]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[7]  Anastasia Ailamaki,et al.  MaSM: efficient online updates in data warehouses , 2011, SIGMOD '11.

[8]  Bingsheng He,et al.  Tree indexing on solid state drives , 2010, Proc. VLDB Endow..

[9]  Sasu Tarkoma,et al.  Theory and Practice of Bloom Filters for Distributed Systems , 2012, IEEE Communications Surveys & Tutorials.

[10]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Dhruba Borthakur Petabyte scale databases and storage systems at Facebook , 2013, SIGMOD '13.

[13]  Philippe Bonnet,et al.  uFLIP: Understanding Flash IO Patterns , 2009, CIDR.

[14]  Patrick E. O'Neil TheSB-tree an index-sequential structure for high-performance sequential access , 2005, Acta Informatica.

[15]  Stratis Viglas,et al.  Flashing up the storage layer , 2008, Proc. VLDB Endow..

[16]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[17]  Sang-Won Lee,et al.  IPLB+-tree for Flash Memory Database Systems , 2011, J. Inf. Sci. Eng..

[18]  David Hutchison,et al.  Scalable Bloom Filters , 2007, Inf. Process. Lett..

[19]  Witold Litwin,et al.  The bounded disorder access method , 1986, 1986 IEEE Second International Conference on Data Engineering.

[20]  Shimin Chen,et al.  FlashLogging: exploiting flash devices for synchronous logging performance , 2009, SIGMOD Conference.

[21]  Christian Esteve Rothenberg,et al.  The deletable Bloom filter: a new member of the Bloom family , 2010, IEEE Communications Letters.

[22]  Michael A. Bender,et al.  Don't Thrash: How to Cache Your Hash on Flash , 2011, Proc. VLDB Endow..

[23]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[24]  David Hung-Chang Du,et al.  A Forest-structured Bloom Filter with flash memory , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[25]  Goetz Graefe B-tree indexes, interpolation search, and skew , 2006, DaMoN '06.

[26]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[27]  Kenneth A. Ross,et al.  Buffered Bloom Filters on Solid State Storage , 2010, ADMS@VLDB.

[28]  Jin-Soo Kim,et al.  mu-tree: an ordered index structure for NAND flash memory , 2007, EMSOFT.

[29]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[30]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[31]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[32]  Guido Moerkotte,et al.  Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing , 1998, VLDB.

[33]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[34]  Sang-Won Lee,et al.  B+-tree Index Optimization by Exploiting Internal Parallelism of Flash-based Solid State Drives , 2011, Proc. VLDB Endow..

[35]  Kenneth A. Ross,et al.  Path processing using Solid State Storage , 2012, ADMS@VLDB.

[36]  Jianliang Xu,et al.  PCMLogging: reducing transaction logging overhead with PCM , 2011, CIKM '11.