A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]

Traditionally most of the anomaly detection algorithms have been designed for 'static' datasets, in which all the observations are available at one time. In non-stationary environments on the other hand, the same algorithms cannot be applied as the underlying data distributions change constantly and the same models are not valid. Hence, we need to devise adaptive models that take into account the dynamically changing characteristics of environments and detect anomalies in 'evolving' data. Over the last two decades, many algorithms have been proposed to detect anomalies in evolving data. Some of them consider scenarios where a sequence of objects (called data streams) with one or multiple features evolves over time. Whereas the others concentrate on more complex scenarios, where streaming objects with one or multiple features have causal/non-causal relationships with each other. The latter can be represented as evolving graphs. In this paper, we categorize existing strategies for detecting anomalies in both scenarios including the state-of-the-art techniques. Since label information is mostly unavailable in real-world applications when data evolves, we review the unsupervised approaches in this paper. We then present an interesting application example, i.e., forest re risk prediction, and conclude the paper with future research directions in this eld for researchers and industry.

[1]  Mahsa Salehi,et al.  A Relevance Weighted Ensemble Model for Anomaly Detection in Switching Data Streams , 2014, PAKDD.

[2]  Alan A. Ager,et al.  A review of recent advances in risk analysis for wildfire management , 2013 .

[3]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[4]  Mark A. Finney,et al.  The challenge of quantitative risk analysis for wildland fire , 2005 .

[5]  Paul Barford,et al.  Intrusion as (anti)social communication: characterization and detection , 2012, KDD.

[6]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[7]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[8]  Christopher Leckie,et al.  An Embedding Scheme for Detecting Anomalous Block Structured Graphs , 2015, PAKDD.

[9]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[10]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[11]  Mahsa Salehi,et al.  An Efficient Method for Anomaly Detection in Non-Stationary Data Streams , 2017, GLOBECOM 2017 - 2017 IEEE Global Communications Conference.

[12]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[13]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[14]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[15]  Kun Li,et al.  Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[16]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[17]  Marimuthu Palaniswami,et al.  Evolving Fuzzy Rules for Anomaly Detection in Data Streams , 2015, IEEE Transactions on Fuzzy Systems.

[18]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[19]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[24]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[25]  Ryan A. Rossi,et al.  Modeling dynamic behavior in large evolving graphs , 2013, WSDM.

[26]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[27]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[28]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[29]  Ian R. Noble,et al.  McArthur's fire-danger meters expressed as equations , 1980 .

[30]  Ira Assent,et al.  AnyOut: Anytime Outlier Detection on Streaming Data , 2012, DASFAA.

[31]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[32]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[33]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[34]  Ji Zhang,et al.  SPOT: A System for Detecting Projected Outliers From High-dimensional Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[35]  Charu C. Aggarwal,et al.  Subspace histograms for outlier detection in linear time , 2018, Knowledge and Information Systems.

[36]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[37]  Xiaoqiao Meng,et al.  Real-time forest fire detection with wireless sensor networks , 2005, Proceedings. 2005 International Conference on Wireless Communications, Networking and Mobile Computing, 2005..

[38]  Mahsa Salehi,et al.  Online Clustering for Evolving Data Streams with Online Anomaly Detection , 2018, PAKDD.

[39]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[40]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[41]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[42]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[43]  Fabrizio Angiulli,et al.  Outlier Detection Techniques for Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[44]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[45]  Matthew O. Ward,et al.  Neighbor-based pattern detection for windows over streaming data , 2009, EDBT '09.

[46]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[47]  Ananthram Swami,et al.  Com2: Fast Automatic Discovery of Temporal ('Comet') Communities , 2014, PAKDD.

[48]  Danai Koutra,et al.  DeltaCon: Principled Massive-Graph Similarity Function with Attribution , 2016, ACM Trans. Knowl. Discov. Data.

[49]  James Bailey,et al.  Node Re-Ordering as a Means of Anomaly Detection in Time-Evolving Graphs , 2016, ECML/PKDD.

[50]  Mahsa Salehi,et al.  Fast Memory Efficient Local Outlier Detection in Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[51]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[52]  Christos Faloutsos,et al.  Metric forensics: a multi-level approach for mining volatile graphs , 2010, KDD.

[53]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[54]  Luca Becchetti,et al.  Link-Based Characterization and Detection of Web Spam , 2006, AIRWeb.

[55]  Marimuthu Palaniswami,et al.  Streaming analysis in wireless sensor networks , 2014, Wirel. Commun. Mob. Comput..

[56]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[57]  Mahsa Salehi,et al.  Dynamic and Robust Wildfire Risk Prediction System: An Unsupervised Approach , 2016, KDD.

[58]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[59]  Danai Koutra,et al.  NetSimile: A Scalable Approach to Size-Independent Network Similarity , 2012, ArXiv.