A New Sketch Method for Measuring Host Connection Degree Distribution

The host connection degree distribution (HCDD) is an important metric for network security monitoring. However, it is difficult to accurately obtain the HCDD in real time for high-speed links with a massive amount of traffic data. In this paper, we propose a new sketch method to build a probabilistic traffic summary of a host's flows using a uniform Flajolet-Martin sketch combined with a small bitmap. To study its performance in comparison with previous sampling and sketch methods, we present a general model that encompasses all these methods. With this model, we compute the Cramér-Rao lower bounds and the variances of HCDD estimations. The theoretic analysis and numerical experimental results show that our sketch method is six times more accurate than state-of-the-art methods with the same memory usage.

[1]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[2]  Vyas Sekar,et al.  Data streaming algorithms for estimating entropy of network traffic , 2006, SIGMETRICS '06/Performance '06.

[3]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[4]  Donald F. Towsley,et al.  Detecting anomalies in network traffic using maximum entropy estimation , 2005, IMC '05.

[5]  Ashwin Lall,et al.  A data streaming algorithm for estimating entropies of od flows , 2007, IMC '07.

[6]  Darryl Veitch,et al.  Fisher Information in Flow Size Distribution Estimation , 2011, IEEE Transactions on Information Theory.

[7]  Dan Schnackenberg,et al.  Statistical approaches to DDoS attack detection and response , 2003, Proceedings DARPA Information Survivability Conference and Exposition.

[8]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[9]  Carsten Lund,et al.  Estimating flow distributions from sampled flow statistics , 2005, TNET.

[10]  Darryl Veitch,et al.  Towards optimal sampling for flow size estimation , 2008, IMC '08.

[11]  Nicolas Hohn,et al.  Inverting sampled traffic , 2003, IEEE/ACM Transactions on Networking.

[12]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[13]  George Varghese,et al.  Automatically inferring patterns of resource consumption in network traffic , 2003, SIGCOMM '03.

[14]  Lili Yang,et al.  Sampled Based Estimation of Network Traffic Flow Characteristics , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[15]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[16]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[17]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[18]  George Varghese,et al.  Bitmap algorithms for counting active flows on high speed links , 2003, IMC '03.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Jing Tao,et al.  Virtual indexing based methods for estimating node connection degrees , 2012, Comput. Networks.

[21]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[22]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[23]  Donald F. Towsley,et al.  Fisher information of sampled packets: an application to flow size estimation , 2006, IMC '06.

[24]  Jin Cao,et al.  Distinct Counting with a Self-Learning Bitmap , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[26]  Donald F. Towsley,et al.  A resource-minimalist flow size histogram estimator , 2008, IMC '08.

[27]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[28]  Tao Qin,et al.  A Data Streaming Method for Monitoring Host Connection Degrees of High-Speed Links , 2011, IEEE Transactions on Information Forensics and Security.

[29]  Abhishek Kumar,et al.  Data streaming algorithms for efficient and accurate estimation of flow size distribution , 2004, SIGMETRICS '04/Performance '04.

[30]  Alfred O. Hero,et al.  Lower bounds for parametric estimation with constraints , 1990, IEEE Trans. Inf. Theory.

[31]  Vyas Sekar,et al.  An empirical evaluation of entropy-based traffic anomaly detection , 2008, IMC '08.

[32]  Abhishek Kumar,et al.  A data streaming algorithm for estimating subpopulation flow size distribution , 2005, SIGMETRICS '05.

[33]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[34]  Jean-Yves Le Boudec,et al.  A Two-Layered Anomaly Detection Technique Based on Multi-modal Flow Behavior Models , 2008, PAM.

[35]  Donald F. Towsley,et al.  On Set Size Distribution Estimation and the Characterization of Large Networks via Sampling , 2012, IEEE Journal on Selected Areas in Communications.

[36]  Donald F. Towsley,et al.  A new virtual indexing method for measuring host connection degrees , 2011, 2011 Proceedings IEEE INFOCOM.

[37]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[38]  George Varghese,et al.  New directions in traffic measurement and accounting , 2002, CCRV.

[39]  Fuzhen Zhang Matrix Theory: Basic Results and Techniques , 1999 .

[40]  Abhishek Kumar,et al.  Joint data streaming and sampling techniques for detection of super sources and destinations , 2005, IMC '05.

[41]  Divesh Srivastava,et al.  Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data , 2004, SIGMOD '04.

[42]  George Varghese,et al.  New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice , 2003, TOCS.

[43]  Bernhard Plattner,et al.  Entropy based worm and anomaly detection in fast IP networks , 2005, 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise (WETICE'05).

[44]  Jing Cao,et al.  Identifying High Cardinality Internet Hosts , 2009, IEEE INFOCOM 2009.

[45]  Ming Zhang,et al.  Scan detection in high-speed networks based on optimal dynamic bit sharing , 2011, 2011 Proceedings IEEE INFOCOM.

[46]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[47]  Abhishek Kumar,et al.  Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[48]  Jin Cao,et al.  Tracking Cardinality Distributions in Network Traffic , 2009, IEEE INFOCOM 2009.

[49]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[50]  Jih-Kwon Peir,et al.  Fit a Spread Estimator in Small Memory , 2009, IEEE INFOCOM 2009.

[51]  Minlan Yu,et al.  Software Defined Traffic Measurement with OpenSketch , 2013, NSDI.