Intrusion Detection System with Data Stream Clustering Approach

fast and high-quality Intrusion Detection algorithms play an important role in providing security management component by organizing large amounts of information into a small number of meaningful clusters. In particular, clustering algorithms that build meaningful groups of data via network log file are ideal tools for their interactive visualization and exploration as they provide a powerful mechanism to detect malicious sessions. This paper focuses on data stream algorithms that build such detection solution and (i) present a comprehensive study data stream clustering algorithm that use different functions and schemes to solve different problems in this area, and (ii) presents a new class of clustering algorithms called Divide and Conquer stream clustering algorithms, which combine features from both partitional and agglomerative approaches that allows them to reduce the early-stage errors made by agglomerative methods and hence improve the quality of clustering solutions. The experimental evaluation shows that, Proposed method lead to better solutions than previous algorithms; making it ideal for clustering large amount of datum network log file due to not only their relatively low computational requirements, but also higher clustering quality. Furthermore, the proposed method consistently leads to better solution when there is no cluster in a window of data and data is monotonous, as well.

[1]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Salvatore J. Stolfo,et al.  Real time data mining-based intrusion detection , 2001, Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01.

[4]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[6]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[7]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[8]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[9]  Philip S. Yu,et al.  A Framework for Clustering Massive Text and Categorical Data Streams , 2006, SDM.

[10]  Hui Wang,et al.  A clustering-based method for unsupervised intrusion detections , 2006, Pattern Recognit. Lett..

[11]  Aoying Zhou,et al.  Distributed Data Stream Clustering: A Fast EM-based Approach , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Graham Cormode,et al.  Conquering the Divide: Continuous Clustering of Distributed Data Streams , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Aoying Zhou,et al.  Tracking clusters in evolving data streams over sliding windows , 2008, Knowledge and Information Systems.

[14]  Ming-Syan Chen,et al.  Clustering over Multiple Evolving Streams by Events and Correlations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Abdul Hanan Abdullah,et al.  A Novel Method for Unsupervised Anomaly Detection Using  Unlabelled Data , 2008, 2008 International Conference on Computational Sciences and Its Applications.

[16]  Xiulan Hao,et al.  Entropy Based Clustering of Data Streams with Mixed Numeric and Categorical Values , 2008, Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008).

[17]  Reda Alhajj,et al.  An Adaptive Multi-agent System for Continuous Learning of Streaming Data , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[18]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Daxin Tian,et al.  Large-scale network intrusion detection based on distributed learning algorithm , 2008, International Journal of Information Security.

[20]  Wei Jiang,et al.  Data stream clustering and modeling using context-trees , 2009, 2009 6th International Conference on Service Systems and Service Management.

[21]  Mihai Lazarescu,et al.  Incremental clustering of dynamic data streams using connectivity based representative points , 2009, Data Knowl. Eng..

[22]  Charu C. Aggarwal,et al.  A Framework for Clustering Massive-Domain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Keke Chen,et al.  HE-Tree: a framework for detecting changes in clustering structure for categorical data streams , 2009, The VLDB Journal.

[24]  Céline Fiot,et al.  Mining Common Outliers for Intrusion Detection , 2009, EGC.

[25]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[26]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Elena Baralis,et al.  Characterizing network traffic by means of the NetMine framework , 2009, Comput. Networks.

[28]  Mei-Ling Shyu,et al.  A Hybrid Layered Multiagent Architecture with Low Cost and Low Response Time Communication Protocol for Network Intrusion Detection Systems , 2009, 2009 International Conference on Advanced Information Networking and Applications.

[29]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .

[30]  Yasser Yasami,et al.  A novel unsupervised classification approach for network anomaly detection by k-Means clustering and ID3 decision tree learning methods , 2010, The Journal of Supercomputing.

[31]  Charu C. Aggarwal,et al.  Data Streams: An Overview and Scientific Applications , 2010, Scientific Data Mining and Knowledge Discovery.

[32]  T. Subbulakshmi,et al.  REAL TIME CLASSIFICATION AND CLUSTERING OF IDS ALERTS USING MACHINE LEARNING ALGORITHMS , 2010 .

[33]  Durga Toshniwal,et al.  Hierarchical Clustering of Projected Data Streams Using Cluster Validity Index , 2011 .

[34]  Md. Nasir Sulaiman,et al.  Intrusion detection system with data mining approach: a review , 2011 .

[35]  Jing-Yu Yang,et al.  Density-based hierarchical clustering for streaming data , 2012, Pattern Recognit. Lett..