Characterizing network traffic by means of the NetMine framework

The NetMine framework allows the characterization of traffic data by means of data mining techniques. NetMine performs generalized association rule extraction to profile communications, detect anomalies, and identify recurrent patterns. Association rule extraction is a widely used exploratory technique to discover hidden correlations among data. However, it is usually driven by frequency constraints on the extracted correlations. Hence, it entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruning rare itemsets even if their hidden knowledge might be relevant. To overcome these issues NetMine exploits a novel algorithm to efficiently extract generalized association rules, which provide a high level abstraction of the network traffic and allows the discovery of unexpected and more interesting traffic rules. The proposed technique exploits (user provided) taxonomies to drive the pruning phase of the extraction process. Extracted correlations are automatically aggregated in more general association rules according to a frequency threshold. Eventually, extracted rules are classified into groups according to their semantic meaning, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on different network dumps showed the efficiency and effectiveness of the NetMine framework to characterize traffic data.

[1]  Lundy Lewis,et al.  Experiments with data mining in enterprise management , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[2]  M. Kaya,et al.  Mining multi-cross-level fuzzy weighted association rules , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[3]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[4]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[5]  Franck Le,et al.  Minerals: using data mining to detect router misconfigurations , 2006, MineNet '06.

[6]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[7]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[8]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[9]  Rayford B. Vaughn,et al.  Adaptive intrusion detection with data mining , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[10]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[11]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[12]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[13]  Jung-Min Park,et al.  Network anomaly detection with incomplete audit data , 2007, Comput. Networks.

[14]  R. Hunt,et al.  TCP/IP security threats and attack methods , 1999, Comput. Commun..

[15]  Donato Malerba,et al.  Inducing Multi-Level Association Rules from Multiple Relations , 2004, Machine Learning.

[16]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[17]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[18]  Ali A. Ghorbani,et al.  Y-means: a clustering method for intrusion detection , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[19]  Qiang Yang,et al.  Web-Log Mining for Predictive Web Caching , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[23]  Qiang Wang,et al.  A clustering algorithm for intrusion detection , 2005, SPIE Defense + Commercial Sensing.

[24]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[25]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[26]  Elena Baralis,et al.  Data mining techniques for effective and scalable traffic analysis , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[27]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[28]  A.E. Mahdi,et al.  Pro-active network management using data mining , 1998, IEEE GLOBECOM 1998 (Cat. NO. 98CH36250).

[29]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[30]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[31]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[32]  Tansel Özyer,et al.  Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule pre-screening , 2007, J. Netw. Comput. Appl..