Network Anomaly Detection Using Co-clustering

Early Internet architecture design goals did not put security as a high priority. However, today Internet security is a quickly growing concern. The prevalence of Internet attacks has increased significantly, but still the challenge of detecting such attacks generally falls on the end hosts and service providers, requiring system administrators to detect and block attacks on their own. In particular, as social networks have become central hubs of information and communication, they are increasingly the target of attention and attacks. This creates a challenge of carefully distinguishing malicious connections from normal ones. Previous work has shown that for a variety of Internet attacks, there is a small subset of connection measurements that are good indicators of whether a connection is part of an attack or not. In this paper we look at the effectiveness of using two different co-clustering algorithms to both cluster connections as well as mark which connection measurements are strong indicators of what makes any given cluster anomalous relative to the total data set. We run experiments with these co-clustering algorithms on the KDD 1999 Cup data set. In our experiments we find that soft co-clustering, running on samples of data, finds consistent parameters that are strong indicators of anomalous detections and creates clusters, that are highly pure. When running hard co-clustering on the full data set (over 100 runs), we on average have one cluster with 92.44% attack connections and the other with 75.84% normal connections. These results are on par with the KDD 1999 Cup winning entry, showing that co-clustering is a strong, unsupervised method for separating normal connections from anomalous ones. Finally, we believe that the ideas presented in this work may inspire research for anomaly detection in social networks, such as identifying spammers and fraudsters.

[1]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[2]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[3]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[4]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[5]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[6]  Satinder Singh,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[7]  Nikos D. Sidiropoulos,et al.  Reviewer Profiling Using Sparse Matrix Regression , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[8]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[9]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[10]  Peyman Kabiri,et al.  Category-Based Selection of Effective Parameters for Intrusion Detection , 2009 .

[11]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[13]  Christos Faloutsos,et al.  Metric forensics: a multi-level approach for mining volatile graphs , 2010, KDD.

[14]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[15]  Ali A. Ghorbani,et al.  Y-means: a clustering method for intrusion detection , 2003, CCECE 2003 - Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No.03CH37436).

[16]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[17]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[18]  ElkanCharles Results of the KDD'99 classifier learning , 2000 .

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Anupam Joshi,et al.  Fuzzy clustering for intrusion detection , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[21]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[22]  Charles Elkan,et al.  Results of the KDD'99 classifier learning , 2000, SKDD.