Learning the Daily Model of Network Traffic

Anomaly detection is based on profiles that represent normal behaviour of users, hosts or networks and detects attacks as significant deviations from these profiles. In the paper we propose a methodology based on the application of several data mining methods for the construction of the “normal” model of the ingoing traffic of a department-level network. The methodology returns a daily model of the network traffic as a result of four main steps: first, daily network connections are reconstructed from TCP/IP packet headers passing through the firewall and represented by means of feature vectors; second, network connections are grouped by applying a clustering method; third, clusters are described as sets of rules generated by a supervised inductive learning algorithm; fourth, rules are transformed into symbolic objects and similarities between symbolic objects are computed for each couple of days. The result is a longitudinal model of the similarity of network connections that can be used by a network administrator to identify deviations in network traffic patterns that may demand for his/her attention. The proposed methodology has been tested on log files of the firewall of our University Department.

[1]  S. Bridges INTRUSION DETECTION VIA FUZZY DATA MINING , 2000 .

[2]  Lynne Billard Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, edited by H.-H. Bock and E. Diday , 2001, J. Classif..

[3]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[4]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[5]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[6]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[7]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[8]  Edwin Diday,et al.  Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..

[9]  Roberto Uribeetxeberria,et al.  Combined Data Mining Approach for Intrusion Detection , 2018, SECRYPT.

[10]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[11]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[12]  Anup K. Ghosh,et al.  A Study in Using Neural Networks for Anomaly and Misuse Detection , 1999, USENIX Security Symposium.

[13]  Arno Sprecher,et al.  An Artificial Intelligence Approach , 1994 .

[14]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[15]  Sushil Jajodia,et al.  ADAM: a testbed for exploring the use of data mining in intrusion detection , 2001, SGMD.

[16]  G. W. Milligan,et al.  CLUSTERING VALIDATION: RESULTS AND IMPLICATIONS FOR APPLIED ANALYSES , 1996 .

[17]  Salvatore J. Stolfo,et al.  AI Approaches to Fraud Detection and Risk Management , 1998, AI Mag..

[18]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[19]  T. Lane,et al.  Sequence Matching and Learning in Anomaly Detection for Computer Security , 1997 .