READ: a three-communicating-stage distributed super points detections algorithm

A super point is a host that interacts with a far larger number of counterparts in the network over a period of time. Super point detection plays an important role in network research and application. With the increase of network scale, distributed super point detection has become a hot research topic. Compared with single-node super point detection algorithm, the difficulty of super point detection in multi-node distributed environment is how to reduce communication overhead. Therefore, this paper proposes a three-stage communication distributed super point detection algorithm: Rough Estimator based Asynchronous Distributed super point detection algorithm (READ). READ uses a lightweight estimator, the Rough Estimator (RE), which is fast in computation and takes less memory to generate candidate super point. At the same time, the Linear Estimator (LE) is used to accurately estimate the cardinality of each candidate super point, so as to detect the super point correctly. In READ, each node scans IP address pairs asynchronously. When reaching the time window boundary, READ starts three-stage communication to detect the super point. In this paper, we proof that the accuracy of READ in distributed environment is no less than that in the single node environment. Four groups of 10 Gb/s and 40 Gb/s real-world high-speed network traffic are used to test READ. The experimental results show that READ not only has higher accuracy in distributed environment, but also has less than 5% of communication burden compared with existing algorithms.

[1]  Jie Xu,et al.  A Super Point Detection Algorithm Under Sliding Time Windows Based on Rough and Linear Estimators , 2019, IEEE Access.

[2]  Jie Xu,et al.  VATE: a trade-off between memory and preserving time for high accuracy cardinalities estimation under sliding time window , 2018, Comput. Commun..

[3]  Tatsuya Mori,et al.  Simple and Adaptive Identification of Superspreaders by Flow Sampling , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[4]  Jie Liu,et al.  High Speed Network Super Points Detection Based on Sliding Time Window by GPU , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[5]  Keqiu Li,et al.  Detection of Superpoints Using a Vector Bloom Filter , 2016, IEEE Transactions on Information Forensics and Security.

[6]  Christian Bettstetter,et al.  Contention-Based Estimation of Neighbor Cardinality , 2013, IEEE Transactions on Mobile Computing.

[7]  Guang Cheng,et al.  Line speed accurate superspreader identification using dynamic error compensation , 2013, Comput. Commun..

[8]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[9]  Xiang-Gen Xia,et al.  A new robust Chinese remainder theorem with improved performance in frequency estimation from undersampled waveforms , 2015, Signal Process..

[10]  Mo Li,et al.  Towards More Efficient Cardinality Estimation for Large-Scale RFID Systems , 2014, IEEE/ACM Transactions on Networking.

[11]  Zhen Liu,et al.  A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion , 2015, Neurocomputing.

[12]  Jih-Kwon Peir,et al.  Fit a Compact Spread Estimator in Small High-Speed Memory , 2011, IEEE/ACM Transactions on Networking.

[13]  Athanasios V. Vasilakos,et al.  An OSPF-Integrated Routing Strategy for QoS-Aware Energy Saving in IP Backbone Networks , 2012, IEEE Transactions on Network and Service Management.

[14]  Dawn Xiaodong Song,et al.  New Streaming Algorithms for Fast Detection of Superspreaders , 2005, NDSS.

[15]  Jan Korenek,et al.  General IDS Acceleration for High-Speed Networks , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[16]  Kenneth J. Christensen,et al.  A new analysis of the false positive rate of a Bloom filter , 2010, Inf. Process. Lett..

[17]  Baris Coskun,et al.  (Un)wisdom of Crowds: Accurately Spotting Malicious IP Clusters Using Not-So-Accurate IP Blacklists , 2017, IEEE Transactions on Information Forensics and Security.

[18]  Yuan He,et al.  Towards Constant-Time Cardinality Estimation for Large-Scale RFID Systems , 2015, 2015 44th International Conference on Parallel Processing.

[19]  Kyu-Young Whang,et al.  A linear-time probabilistic counting algorithm for database applications , 1990, TODS.

[20]  Tao Qin,et al.  A Data Streaming Method for Monitoring Host Connection Degrees of High-Speed Links , 2011, IEEE Transactions on Information Forensics and Security.

[21]  Philippe Flajolet,et al.  Probabilistic counting , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[22]  Muttukrishnan Rajarajan,et al.  A survey of intrusion detection techniques in Cloud , 2013, J. Netw. Comput. Appl..

[23]  Jie Xu,et al.  SRLA: A Real Time Sliding Time Window Super Point Cardinality Estimation Algorithm for High Speed Network Based on GPU , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).