A Survey of Bitmap Index Compression Algorithms for Big Data

With the growing popularity of Internet applications and the widespread use of mobile Internet, Internet traffic has maintained rapid growth over the past two decades. Internet Traffic Archival Systems (ITAS) for packets or flow records have become more and more widely used in network monitoring, network troubleshooting, and user behavior and experience analysis. Among the three key technologies in ITAS, we focus on bitmap index compression algorithm and give a detailed survey in this paper. The current state-of-the-art bitmap index encoding schemes include: BBC, WAH, PLWAH, EWAH, PWAH, CONCISE, COMPAX, VLC, DF-WAH, and VAL-WAH. Based on differences in segmentation, chunking, merge compress, and Near Identical (NI) features, we provide a thorough categorization of the state-of-the-art bitmap index compression algorithms. We also propose some new bitmap index encoding algorithms, such as SECOMPAX, ICX, MASC, and PLWAH+, and present the state diagrams for their encoding algorithms. We then evaluate their CPU and GPU implementations with a real Internet trace from CAIDA. Finally, we summarize and discuss the future direction of bitmap index compression algorithms. Beyond the application in network security and network forensic, bitmap index compression with faster bitwise-logical operations and reduced search space is widely used in analysis in genome data, geographical information system, graph databases, image retrieval, Internet of things, etc. It is expected that bitmap index compression will thrive and be prosperous again in Big Data era since 1980s.

[1]  Michail Vlachos,et al.  Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic , 2010, Proc. VLDB Endow..

[2]  Huang Zhihon Improved BLAST algorithm based on bitmap indexes and B+tree , 2013 .

[3]  Guadalupe Canahuate,et al.  A tunable compression framework for bitmap indices , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[4]  Yang Fangchun,et al.  An efficient adaptive failure detection mechanism for cloud platform based on volterra series , 2014, China Communications.

[5]  Peter Desnoyers,et al.  Hyperion: High Volume Stream Archival for Retrospective Querying , 2007, USENIX Annual Technical Conference.

[6]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[7]  Robert Wrembel,et al.  RLH: bitmap compression technique based on run-length and huffman encoding , 2007, DOLAP '07.

[8]  Josep-Lluís Larriba-Pey,et al.  Efficient graph management based on bitmap indices , 2012, IDEAS '12.

[9]  L. Deri Improving Passive Packet Capture : Beyond Device Polling , 2003 .

[10]  Jianting Zhang,et al.  Dynamic tiled map services: supporting query-based visualization of large-scale raster geospatial data , 2010, COM.Geo '10.

[11]  Evangelos P. Markatos,et al.  Scap: stream-oriented network traffic capture and analysis for high-speed networks , 2013, Internet Measurement Conference.

[12]  Robert Wrembel,et al.  GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps , 2011 .

[13]  Jun Rao,et al.  Dynamic faceted search for discovery-driven analysis , 2008, CIKM '08.

[14]  Hakan Ferhatosmanoglu,et al.  Update Conscious Bitmap Indices , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[15]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[16]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[17]  Alessandro Colantonio,et al.  Concise: Compressed 'n' Composable Integer Set , 2010, Inf. Process. Lett..

[18]  Owen Kaser,et al.  Better bitmap performance with Roaring bitmaps , 2014, Softw. Pract. Exp..

[19]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[20]  Li Jun Bitmap data structure:Towards high-performance network algorithms designing , 2009 .

[21]  Xiaocheng Zou,et al.  Scalable in situ scientific data encoding for analytical query processing , 2013, HPDC.

[22]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[23]  Ramesh Govindan,et al.  Advanced Indexing Techniques for Wide-Area Network Monitoring , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[24]  Anja Feldmann,et al.  Enriching network security analysis with time travel , 2008, SIGCOMM '08.

[25]  Jun Li,et al.  TIFA: Enabling Real-Time Querying and Storage of Massive Stream Data , 2011, 2011 Second International Conference on Networking and Distributed Computing.

[26]  Junwei Cao,et al.  PLWAH+: A bitmap index compressing scheme based on PLWAH , 2014, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[27]  Marianne Winslett,et al.  Bitmap indexes for large scientific data sets: a case study , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[28]  Yifan Yu,et al.  TIFAflow: enhancing traffic archiving system with flow granularity for forensic analysis in network security , 2013 .

[29]  Robert Wrembel,et al.  GPU-WAH: Applying GPUs to Compressing Bitmap Indexes with Word Aligned Hybrid , 2010, DEXA.

[30]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[31]  Sasu Tarkoma,et al.  MPTCP incast in data center networks , 2014, China Communications.

[32]  Guang-Ho Cha,et al.  Bitmap indexing method for complex similarity queries with relevance feedback , 2003, MMDB '03.

[33]  Kesheng Wu,et al.  FastBit: An Efficient Indexing Technology For Accelerating Data-Intensive Science , 2005 .

[34]  Junwei Cao,et al.  SECOMPAX: A bitmap index compression algorithm , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[35]  Jeffrey Dean,et al.  Challenges in building large-scale information retrieval systems: invited talk , 2009, WSDM '09.

[36]  Arie Shoshani,et al.  Accelerating gene context analysis using bitmaps , 2013, SSDBM.

[37]  Xenofontas A. Dimitropoulos,et al.  Indexing million of packets per second using GPUs , 2013, Internet Measurement Conference.

[38]  Esko Nuutila An Efficient Transitive Closure Algorithm for Cyclic Digraphs , 1994, Inf. Process. Lett..

[39]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[40]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[41]  Xiaobin Guo,et al.  BreadZip: a combination of network traffic data and bitmap index encoding algorithm , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[42]  Zhen Chen,et al.  High Speed Traffic Archiving System for Flow Granularity Storage and Querying , 2012, 2012 21st International Conference on Computer Communications and Networks (ICCCN).

[43]  David Chiu,et al.  Variable Length Compression for Bitmap Indices , 2011, DEXA.

[44]  Nagiza F. Samatova,et al.  Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[45]  Arie Shoshani,et al.  Enabling Real-Time Querying of Live and Historical Stream Data , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[46]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[47]  Nasir D. Memon,et al.  NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring , 2010, RAID.

[48]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[49]  Kyong-Ha Lee,et al.  Bitmap indexes for relational XML twig query processing , 2009, CIKM.

[50]  Marianne Winslett,et al.  Multi-resolution bitmap indexes for scientific data , 2007, TODS.

[51]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[52]  Israel Spiegler,et al.  Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[53]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[54]  Thiago Luís Lopes Siqueira,et al.  A spatial bitmap-based index for geographical data warehouses , 2009, SAC '09.

[55]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[56]  Xenofontas A. Dimitropoulos,et al.  RasterZip: compressing network monitoring data with support for partial decompression , 2012, Internet Measurement Conference.

[57]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[58]  Hang Li,et al.  Mobile Internet big data platform in China Unicom , 2014 .

[59]  Xenofontas A. Dimitropoulos,et al.  pcapIndex: an index for network packet traces with legacy compatibility , 2012, CCRV.

[60]  Pu Lingjun,et al.  “Smart cafe”: A mobile local computing system based on indoor virtual cloud , 2014, China Communications.

[61]  李丽,et al.  《Tsinghua Science and Technology》网上国际审稿 , 2002 .

[62]  Ying Hu,et al.  Supporting RFID-based Item Tracking Applications in Oracle DBMS Using a Bitmap Datatype , 2005, VLDB.

[63]  Depei Qian,et al.  Differentiating data collection for cloud environment monitoring , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[64]  Sergei Vassilvitskii,et al.  Efficiently encoding term co-occurrences in inverted indexes , 2011, CIKM '11.