Online Mining of Risk Level of Traffic Anomalies with User~s Feedbacks

Traffic anomaly has been rated as an important risk indication in computer networks. Unsupervised online detection of possible risks is crucial to prompt resolutions when streams of traffic data are collected in a network. Current anomaly detection techniques using positive security method suffer from a high false alarm rate when a high detection rate is pursued. This paper presents a heuristic risk assessment model in a spatiotemporal environment which incorporates an anomaly detection model with user feedbacks to historical events. Operations proposed are solely based on the synopsis of the data stream profile characterized by a dynamic Markov chain with each state denoting a representative granule in the data space. The Markov property is used to determine the size of granules to reach an optimal performance. The model is efficient, incremental, and scalable, and thus suitable for soft real- time processing. The experiments conducted with VoIP CDR (Call Detail Records) data provided by Cisco Systems show that compared with a positive security-based anomaly detection model, the false alarm rate caused by the proposed model is significantly mitigated without losing a high detection rate.

[1]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[2]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[3]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[4]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[5]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[6]  Maja J. Mataric,et al.  Coordinating mobile robot group behavior using a model of interaction dynamics , 1999, AGENTS '99.

[7]  Jie Huang,et al.  Extensible Markov model , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[9]  KeoghEamonn,et al.  Clustering of time-series subsequences is meaningless , 2005 .

[10]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[13]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[14]  A. Hadi,et al.  BACON: blocked adaptive computationally efficient outlier nominators , 2000 .

[15]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[16]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[17]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[18]  Nong Ye,et al.  A Markov Chain Model of Temporal Behavior for Anomaly Detection , 2000 .

[19]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .