Self-organizing anomaly detection in data streams

Many distributed systems continuously gather, produce and elaborate data, often as data streams that can change over time. Discovering anomalous data is fundamental to obtain critical and actionable information such as intrusions, faults, and system failures. This paper proposes a multi-agent algorithm to detect anomalies in distributed data streams. As data items arrive from whatever sources, they are associated with bio-inspired agents and randomly disseminated onto a virtual space. The loaded agents move on the virtual space in order to form a group following the flocking algorithm. The agents group on the basis of a predefined concept of similarity of their associated objects. Only the agents associated to similar objects form a flock, whereas the agents associated with objects dissimilar to each other do not group in flocks. Anomalies are objects associated with isolated agents or objects associated with agents belonging to flocks having a few number of elements. Swarm intelligence features of the approach, such as adaptivity, parallelism, asynchronism, and decentralization, make the algorithm scalable to very large data sets and very large distributed systems. Experimental results for real and synthetic datasets confirm the validity of the proposed model.

[1]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[2]  Thomas E. Potok,et al.  A Distributed Agent Implementation of Multiple Species Flocking Model for Document Partitioning Clustering , 2006, CIA.

[3]  Giandomenico Spezzano,et al.  FlockStream: A Bio-Inspired Algorithm for Clustering Evolving Data Streams , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[4]  Liang Su,et al.  Continuous Adaptive Outlier Detection on Distributed Data Streams , 2007, HPCC.

[5]  Thomas E. Potok,et al.  A flocking based algorithm for document clustering analysis , 2006, J. Syst. Archit..

[6]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[7]  Gillian Dobbie,et al.  A swarm intelligence based clustering approach for outlier detection , 2010, IEEE Congress on Evolutionary Computation.

[8]  Chin-Chuan Han,et al.  Intrusive behavior analysis based on honey pot tracking and ant algorithm analysis , 2009, 43rd Annual 2009 International Carnahan Conference on Security Technology.

[9]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[10]  Nicolas Monmarché,et al.  On Improving Clustering in Numerical Databases with Artificial Ants , 1999, ECAL.

[11]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[12]  Faizah Shaari,et al.  Outlier detection method based on hybrid rough: negative using PSO algorithm , 2014, ICUIMC '14.

[13]  Giandomenico Spezzano,et al.  An adaptive flocking algorithm for performing approximate clustering , 2009, Inf. Sci..

[14]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[15]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[16]  Paul H. Calamai,et al.  Exchange strategies for multiple Ant Colony System , 2007, Inf. Sci..

[17]  Bo Liu,et al.  Swarm Intelligence and its Application in Abnormal Data Detection , 2015, Informatica.

[18]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[19]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[20]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[21]  Gillian Dobbie,et al.  An Evolutionary Particle Swarm Optimization algorithm for data clustering , 2008, 2008 IEEE Swarm Intelligence Symposium.

[22]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[23]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[24]  P.-P. Grasse La reconstruction du nid et les coordinations interindividuelles chezBellicositermes natalensis etCubitermes sp. la théorie de la stigmergie: Essai d'interprétation du comportement des termites constructeurs , 1959, Insectes Sociaux.

[25]  Kun Li,et al.  Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[26]  Salvatore J. Stolfo,et al.  Collaborative Distributed Intrusion Detection , 2004 .

[27]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[28]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[29]  Madjid Khalilian,et al.  Data Stream Clustering: Challenges and Issues , 2010, ArXiv.

[30]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[31]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[32]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[33]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[34]  Peter G. Neumann,et al.  EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances , 1997, CCS 2002.

[35]  Mengjie Zhang,et al.  Particle swarm optimisation for outlier detection , 2010, GECCO '10.

[36]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[37]  Jian Tang,et al.  Capabilities of outlier detection schemes in large datasets, framework and methodologies , 2006, Knowledge and Information Systems.

[38]  Philip S. Yu,et al.  On Clustering Massive Data Streams: A Summarization Paradigm , 2007, Data Streams - Models and Algorithms.

[39]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.