Mining frequent patterns from network flows for monitoring network

Because of the varying and dynamic characteristics of network traffic, such as fast transfer, huge volume, shot-lived, inestimable and infinite, it is a serious challenge for network administrators to monitor network traffic in real time and judge whether the whole network works well. Currently, most of the existing techniques in this area are based on signature training, learning or matching, which may be too complicated to satisfy timely requirements. Other statistical methods including sampling, hashing or counting are all approximate methods and compute an incomplete set of results. Since the main objective of network monitoring is to discover and understand the active events that happen frequently and may influence or even ruin the total network. So in the paper we aim to use the technique of frequent pattern mining to find out these events. We first design a sliding window model to make sure the mining result novel and integrated; then, under the consideration of the distribution and fluidity of network flows, we develop a powerful class of algorithms that contains vertical re-mining algorithm, multi-pattern re-mining algorithm, fast multi-pattern capturing algorithm and fast multi-pattern capturing supplement algorithm to deal with a series of problems when applying frequent pattern mining algorithm in network traffic analysis. Finally, we develop a monitoring system to evaluate our algorithms on real traces collected from the campus network of Peking University. The results show that some given algorithms are effective enough and our system can definitely identify a lot of potentially very valuable information in time which greatly help network administrators to understand regular applications and detect network anomalies. So the research in this paper not only provides a new application area for frequent pattern mining, but also provides a new technique for network monitoring.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[4]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[5]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[6]  Wenke Lee,et al.  Evading network anomaly detection systems: formal reasoning and practical techniques , 2006, CCS '06.

[7]  Hongjun Lu,et al.  Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Stuart Staniford-Chen,et al.  Practical Automated Detection of Stealthy Portscans , 2002, J. Comput. Secur..

[10]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[11]  Carrie Gates,et al.  Challenging the anomaly detection paradigm: a provocative discussion , 2006, NSPW '06.

[12]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[13]  Harold S. Javitz,et al.  The NIDES Statistical Component Description and Justification , 1994 .

[14]  Salvatore J. Stolfo,et al.  On the infeasibility of modeling polymorphic shellcode , 2009, Machine Learning.

[15]  Anup K. Ghosh,et al.  A Study in Using Neural Networks for Anomaly and Misuse Detection , 1999, USENIX Security Symposium.

[16]  Sushi Jajodia,et al.  Integration of Audit Data Analysis and Mining Techniques into Aide , 2006 .

[17]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[18]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[19]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[21]  Zhendong Su,et al.  On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits , 2005, CCS '05.

[22]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[23]  Robert K. Cunningham,et al.  Improving Intrusion Detection Performance using Keyword Selection and Neural Networks , 2000, Recent Advances in Intrusion Detection.

[24]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[27]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[28]  Philip K. Chan,et al.  Learning Useful System Call Attributes for Anomaly Detection , 2005, FLAIRS Conference.

[29]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).