BAH: A Bitmap Index Compression Algorithm for Fast Data Retrieval

Efficient retrieval of traffic archival data is a must-have technique to detect network attacks, such as APT(advanced persistent threat) attack. In order to take insight from Internet traffic, the bitmap index is increasingly used for efficiently querying over large datasets. However, a raw bitmap index leads to high space consumption and overhead on loading indexes. Various bitmap index compression algorithms are proposed to save storage while improving query efficiency. This paper proposes a new bitmap index compression algorithm called BAH (Byte Aligned Hybrid compression coding). An acceleration algorithm using SIMD is designed to increase the efficiency of AND operation over multiple compressed bitmaps. In all, BAH has a better compression ratio and faster intersection querying speed compared with several previous works such as WAH, PLWAH, COMPAX, Roaring etc. The theoretical analysis shows that the space required by BAH is no larger than 1.6 times the information entropy of the bitmap with density larger than 0.2%. In the experiments, BAH saves about 65% space and 60% space compared with WAH on two datasets. The experiments also demonstrate the query efficiency of BAH with the application in Internet Traffic and Web pages.

[1]  Cisco Visual Networking Index: Forecast and Methodology 2016-2021.(2017) http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual- networking-index-vni/complete-white-paper-c11-481360.html. High Efficiency Video Coding (HEVC) Algorithms and Architectures https://jvet.hhi.fraunhofer. , 2017 .

[2]  Junwei Cao,et al.  SPLWAH: A bitmap index compression scheme for searching in archival Internet traffic , 2015, 2015 IEEE International Conference on Communications (ICC).

[3]  Jing Zhou,et al.  MASC: A bitmap index encoding algorithm for fast data retrieval , 2016, 2016 IEEE International Conference on Communications (ICC).

[4]  S. Srinivasa Rao,et al.  SBH: Super byte-aligned hybrid bitmap compression , 2016, Inf. Syst..

[5]  Yinjun Wu,et al.  A Survey of Bitmap Index Compression Algorithms for Big Data , 2015 .

[6]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[7]  Michail Vlachos,et al.  Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic , 2010, Proc. VLDB Endow..

[8]  Alessandro Colantonio,et al.  Concise: Compressed 'n' Composable Integer Set , 2010, Inf. Process. Lett..

[9]  Jing Zhou,et al.  CAMP: A New Bitmap Index for Data Retrieval in Traffic Archival , 2016, IEEE Communications Letters.

[10]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[11]  Junwei Cao,et al.  SECOMPAX: A bitmap index compression algorithm , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[12]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[13]  Yinjun Wu,et al.  A General Analytical Model for Spatial and Temporal Performance of Bitmap Index Compression Algorithms in Big Data , 2015, 2015 24th International Conference on Computer Communication and Networks (ICCCN).

[14]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[15]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[16]  Junwei Cao,et al.  PLWAH+: A bitmap index compressing scheme based on PLWAH , 2014, 2014 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[17]  Israel Spiegler,et al.  Storage and retrieval considerations of binary data bases , 1985, Inf. Process. Manag..

[18]  Owen Kaser,et al.  Better bitmap performance with Roaring bitmaps , 2014, Softw. Pract. Exp..

[19]  Anastasios Kementsietsidis,et al.  Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data , 2001, SIGMOD 2011.