BreadZip: a combination of network traffic data and bitmap index encoding algorithm

Nowadays, rapid evolution of computers and mobile devices has caused the explosive increase in network traffic. So it becomes more and more necessary to archive network traffic for analyzing network events and a lot of emerging applications. Compression is fundamental for traffic archival solution to save the storage space, and indexing is effective to accelerate search queries for archive of traffic data. In this paper, we propose BreadZip (blocks row-reordering and adaptive index zip), a combination of initial traffic data and index compression. BreadZip has three main advantages. 1) to improve compressing efficiency and reduce memory footprint, traffic data is reordered in sequence and divided into fixed-size blocks; 2) to accelerate queries, an improved bitmap indexes with smaller volume than traditional will be introduced; 3) to save space, both traffic blocks and bitmap indexes are compressed in different simple run-length encoding methods respectively. Finally, our empirical results on network traffic from CAIDA (Cooperative Association for Internet Data Analysis) show that our solution can significantly reduce the volume of traffic data, while simultaneously preserving the ability to perform selectively queries with response times in seconds.

[1]  Prabhat,et al.  FastBit: interactively searching massive data , 2009 .

[2]  Torben Bach Pedersen,et al.  Position list word aligned hybrid: optimizing space and performance for compressed bitmaps , 2010, EDBT '10.

[3]  Owen Kaser,et al.  Sorting improves word-aligned bitmap indexes , 2010, Data Knowl. Eng..

[4]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[5]  Arie Shoshani,et al.  Optimizing bitmap indices with efficient compression , 2006, TODS.

[6]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[7]  Xenofontas A. Dimitropoulos,et al.  RasterZip: compressing network monitoring data with support for partial decompression , 2012, Internet Measurement Conference.

[8]  Martin L. Kersten,et al.  Breaking the memory wall in MonetDB , 2008, CACM.

[9]  Luca Deri,et al.  Collection and Exploration of Large Data Monitoring Sets Using Bitmap Databases , 2010, TMA.

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  Michail Vlachos,et al.  Net-Fli: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic , 2010, Proc. VLDB Endow..

[12]  Michail Vlachos,et al.  Real-time creation of bitmap indexes on streaming network data , 2011, The VLDB Journal.

[13]  Anja Feldmann,et al.  Enriching network security analysis with time travel , 2008, SIGCOMM '08.

[14]  Yifan Yu,et al.  TIFAflow: enhancing traffic archiving system with flow granularity for forensic analysis in network security , 2013 .

[15]  Owen Kaser,et al.  Reordering columns for smaller indexes , 2011, Inf. Sci..