FastBit: interactively searching massive data

As scientific instruments and computer simulations produce more and more data, the task of locating the essential information to gain insight becomes increasingly difficult. FastBit is an efficient software tool to address this challenge. In this article, we present a summary of the key underlying technologies, namely bitmap compression, encoding, and binning. Together these techniques enable FastBit to answer structured (SQL) queries orders of magnitude faster than popular database systems. To illustrate how FastBit is used in applications, we present three examples involving a high-energy physics experiment, a combustion simulation, and an accelerator simulation. In each case, FastBit significantly reduces the response time and enables interactive exploration on terabytes of data.

[1]  R. E. Casten,et al.  Nuclear Physics , 1935, Nature.

[2]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[3]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[4]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[5]  Paul T. Murphy,et al.  An Architecture for a Business and Information System , 1988, IBM Syst. J..

[6]  James S. Harris,et al.  The Star Experiment at the Relativistic Heavy-Ion Collider , 1994 .

[7]  J. Mogul,et al.  Computer networks and isdn systems , 1995 .

[8]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[9]  W. H. Inmon,et al.  The data warehouse and data mining , 1996, CACM.

[10]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[11]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[14]  Theodore Johnson,et al.  Performance Measurements of Compressed Bitmap Indices , 1999, VLDB.

[15]  Nick Koudas Space efficient bitmap indexing , 2000, CIKM '00.

[16]  Arie Shoshani,et al.  Compressed bitmap indices for efficient query processing , 2001 .

[17]  Arie Shoshani,et al.  A performance comparison of bitmap indexes , 2001, CIKM '01.

[18]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[19]  Arie Shoshani,et al.  Using bitmap index for interactive exploration of large datasets , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[20]  Arie Shoshani,et al.  Evaluation Strategies for Bitmap Indices with Binning , 2004, DEXA.

[21]  Arie Shoshani,et al.  Grid Collector: Using an event catalog to speed up user analysisin distributed environment , 2004 .

[22]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[23]  Arie Shoshani,et al.  Grid Collector: Using an event catalog to speed up user analysis in distributed environment - eScholarship , 2004 .

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Nelson L. Max,et al.  A contract based system for large data visualization , 2005, VIS 05. IEEE Visualization, 2005..

[26]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[27]  Kesheng Wu,et al.  Optimizing I/O Costs of Multi-dimensional Queries Using Bitmap Indices , 2005, DEXA.

[28]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[29]  Kesheng Wu,et al.  Optimizing candidate check costs for bitmap indices , 2005, CIKM '05.

[30]  Arie Shoshani,et al.  Grid Collector: Facilitating Efficient Selective Access from DataGrids , 2005 .

[31]  John Shalf,et al.  Query-driven visualization of large data sets , 2005, VIS 05. IEEE Visualization, 2005..

[32]  K. Stockinger,et al.  Detecting Distributed Scans Using High-Performance Query-Driven Visualization , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[33]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[34]  Kesheng Wu,et al.  Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[35]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[36]  M. Joos,et al.  The ROD crate DAQ software framework of the ATLAS data acquisition system , 2006, IEEE Transactions on Nuclear Science.

[37]  John Shalf,et al.  HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices , 2005, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[38]  Kesheng Wu,et al.  Bitmap Indices for Data Warehouses , 2006 .

[39]  Frederick Reiss,et al.  Data triage , 2007 .

[40]  Kurt Stockinger,et al.  Performance of Multi-Level and Multi-Component Compressed Bitmap Indexes , 2007 .

[41]  Marianne Winslett,et al.  Indexing scientific data , 2007 .

[42]  Arie Shoshani,et al.  Performances of Multi-Level and Multi-Component Compressed BitmapIndices , 2007 .

[43]  Marianne Winslett,et al.  Multi-resolution bitmap indexes for scientific data , 2007, TODS.

[44]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[45]  Oliver Rubel Application of High-performance Visual Analysis Methods to Laser Wakefield Particle Acceleration Data , 2008 .

[46]  Hans Hagen,et al.  High performance multivariate visual data exploration for extremely large data , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[47]  Arie Shoshani,et al.  Breaking the Curse of Cardinality on Bitmap Indexes , 2008, SSDBM.

[48]  Kesheng Wu,et al.  Optimizing two-pass connected-component labeling algorithms , 2009, Pattern Analysis and Applications.

[49]  Ozgur Cobanoglu,et al.  Commissioning of the ALICE data acquisition system , 2008 .

[50]  Prabhat,et al.  Title High performance multivariate visual data exploration for extremely large data Permalink , 2008 .

[51]  Thiago Luís Lopes Siqueira,et al.  A spatial bitmap-based index for geographical data warehouses , 2009, SAC '09.

[52]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[53]  W. A. Gillespie,et al.  High quality electron beams from a laser wakefield accelerator , 2010, CLEO/QELS: 2010 Laser Science to Photonic Applications.

[54]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[55]  IEEE Transactions on Nuclear Science , 2023, IEEE Transactions on Nuclear Science.