Threshold Sampling for Network Streaming Data Analysis

Network streaming data are the network traffic records coming from high-speed network links. They arrive continually and their volumes are huge. The key to analysis of network streaming data is to design a smaller yet well organized data subset to glean the most important information for quickly answering a specific type of query. In this paper, we propose a threshold sampling algorithm for network streaming data analysis. Using the threshold sampling, the analysis process can focus on the large traffic but never neglect small traffic. Moreover, the algorithm is evaluated to pick out the frequent items to detect super sources and destinations from the network streaming data. Contrasting the threshold sampling method with traditional sampling methods, we conclude that the proposed method has a better self-adaptability and controllability of resource consumption without sacrificing accuracy.