Unsupervised Network Anomaly Detection in Real-Time on Big Data

Network anomaly detection relies on intrusion detection systems based on knowledge databases. However, building this knowledge may take time as it requires manual inspection of experts. Actual detection systems are unable to deal with 0-day attack or new user’s behavior and in consequence they may fail in correctly detecting intrusions. Unsupervised network anomaly detectors overcome this issue as no previous knowledge is required. In counterpart, these systems may be very slow as they need to learn traffic’s pattern in order to acquire the necessary knowledge to detect anomalous flows. To improve speed, these systems are often only exposed to sampled traffic, harmful traffic may then avoid the detector examination. In this paper, we propose to take advantage of new distributed computing framework in order to speed up an Unsupervised Network Anomaly Detector Algorithm, UNADA. The evaluation shows that the execution time can be improved by a factor of 13 allowing UNADA to process large traces of traffic in real time.

[1]  Kensuke Fukuda,et al.  Extracting hidden anomalies using sketch and non Gaussian multiresolution statistical detection procedures , 2007, LSAD '07.

[2]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[3]  Houkuan Huang,et al.  A Grid-Based Clustering Algorithm for Network Anomaly Detection , 2007, The First International Symposium on Data, Privacy, and E-Commerce (ISDPE 2007).

[4]  Kensuke Fukuda,et al.  Evaluation of Anomaly Detection Based on Sketch and PCA , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[5]  Mehmet Celenk,et al.  Anomaly detection and visualization using Fisher Discriminant clustering of network entropy , 2008, 2008 Third International Conference on Digital Information Management.

[6]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[7]  Philippe Owezarski,et al.  Unsupervised Network Intrusion Detection Systems: Detecting the Unknown without Knowledge , 2012, Comput. Commun..

[8]  Kensuke Fukuda,et al.  Hashdoop: A MapReduce framework for network anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[9]  Martin May,et al.  Impact of packet sampling on anomaly detection metrics , 2006, IMC '06.

[10]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[11]  Rafael Sachetto Oliveira,et al.  G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering , 2013, ICCS.

[12]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[13]  Peide Liu Research on Risk Evaluation for Venture Capital Based on Intuitionistic Fuzzy Set and TOPSIS , 2007, The First International Symposium on Data, Privacy, and E-Commerce (ISDPE 2007).

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[16]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[17]  Kensuke Fukuda,et al.  A Hough-transform-based anomaly detector with an adaptive time interval , 2011, SAC '11.

[18]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[19]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[20]  Donald F. Towsley,et al.  Detecting anomalies in network traffic using maximum entropy estimation , 2005, IMC '05.